Comparative Analysis of SDG Implementation Evolution Worldwide

Author

Lodrik Adam, Sofia Benczédi, Stefan Favre, Delia Fuchs

Published

December 6, 2023

1 Introduction

1.1 Overview and Motivation

The global significance of the SDGs is our basis. The adoption of the SDGs by the United Nation in 2015 marked a significant global commitment to address pressing issues such as poverty, inequality, climate, change, and more. The fact that these goals were unanimously adopted by 193 member states underscores their importance. This prompted us to ask ourselves, can we evaluate the progress? What has really been done so far? Although the SDGs have attracted considerable attention and backing, it is essential to evaluate the events preceding and following their implementation. Understanding the actions taken and progress made is essential in determining if these global commitments are resulting in tangible enhancements to individuals’ lives. By examining the evolution of all countries and their respective contributions towards achieving the SDGs, we can develop a comprehensive understanding of collective efforts and identify potential disparities or gaps.

1.3 Research questions

  1. Focus on factors: What can explain the state of the countries regarding sustainable development? (we will analyse different factors: scores from the human freedom index, GDP per capita, military expenditures in % of GDP/government expenditure, unemployment rate, internet usage). See data description for more precise information about the factors.

  2. Focus on time: How has the adoption of the SDGs in 2015 influenced the achievement of SDGs? (we want to compare the achievement (SDG scores: there are scores calculated even before the adoption) of the different countries before and after 2015 to see if the adoption of SDG gave a real “push” to sustainable development)

  3. Focus on events: Is the evolution in sustainable development influenced by uncontrollable events, such as economic crisis, health crises and natural disasters? (we will analyse the impact of the COVID, natural disasters and conflicts (# deaths, damages, etc.) on the SDG scores). See data description for more precise information about how the impact of these events are materialized into data.

  4. Focus on relationship between SDGs: How are the different SDGs linked? (We want to see if some SDGs are linked in the fact that a high score on one implies a high score on the other, and thus if we can make groups of SDGs that are comparable in that way).

2 Data

2.1 Sources

We are collecting our Data from the sustainability development report (SDG), the international labour organization (ILOSTAT), the World Bank, Our world in data, the CATO institute, one from Kaggle (disasters: we couldn’t find relevant accessible information from somewhere else) and GitHub. We found different datasets containing useful information in relation with the SDGs. The details about these data and the links are presented in the next section. Utilizing the kableExtra package, we provide a comprehensive list and corresponding links to our sources, as outlined below:

Name of the Table Source
D1_1_SDG dashboards.sdgindex.org
D2_2_Unemployment_rate ilo.org
D3_0_GDP_per_capita data.worldbank.org
D3_1_Military_expenditure_percent_GDP data.worldbank.org
D3_2_Military_expenditure_percent_gov_exp data.worldbank.org
D4_0_Internet_usage ourworldindata.org
D5_0_Human_freedom_index cato.org
D6_0_Disaters kaggle.com
D7_0_COVID github.com
D8_0_Conflicts datacatalog.worldbank.org

2.2 Description

During the wrangling process, we added data to our table (D1_1_SDG) from different other datasets and match them based on the country code, and the year. The tables below show all the variables present in our 9 databases. We will then merge them to have our final table for the analysis.

2.2.1 Our databases

Sustainable Development Goals database (DGD1_1_SDG)

Our primary database focuses on the Sustainable Development Goals (SDG). Below is a table summarizing the key variables included:

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
overallscore Overall score on all 17 SDGs (the score are % of achievement of the goals determined by the UN based on several indicators)
goal1:goal17 Score on each SDG except SDG 14 (16 variables)
population Population of the country

The Sustainable Development Goals (SDGs) are a universal set of 17 interlinked goals that were adopted by the United Nations in 2015 as part of the 2030 Agenda for Sustainable Development. These goals provide a shared blueprint for peace and prosperity for people and the planet, now and into the future.

Unemployment rate database (D2_2_Unemployment_rate)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
unemployment.rate Unemployment rate (% of the population 15 years old and older)

GDP per capita database (D3_0_GDP_per_capita)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
GDPpercapita GDP per capita

Proportion of the GDP dedicated to Military expenditures database (D3_1_Military_expenditure_percent_GDP)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
MilitaryExpenditurePercentGDP Military expenditures in percentage of GDP

Internet usage database (D4_0_Internet_usage)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
internet.usage Internet usage (% of the population)

Human freedom index database (D5_0_Human_freedom_index)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
region Part of the world, group of countries (e.g. Eastern Europe, Dub-Saharan Africa, South Asia, etc.)
pf_law Rule of law, mean score of: Procedural justice, Civil, justice, Criminal justice, Rule of law (V-Dem)
pf_security Security and safety, mean score of: Homicide, Disappearances conflicts, terrorism
pf_movement Freedom of movement (V-Dem), Freedom of movement (CLD)
pf_religion Freedom of religion, Religious organization, repression
pf_assembly Civil society entry and exit, Freedom of assembly, Freedom to form/run political parties, Civil society repression
pf_expression Direct attacks on the press, Media and expression (V-Dem), Media and expression (Freedom House), Media and expression (BTI), Media and expression (CLD)
pf_identity Same-sex relationships, Divorce, Inheritance rights, Female genital mutilation
ef_gouvernment Government consumption, Transfers and subsidies, Government investment, Top marginal tax rate, State ownership of assets
ef_legal Judicial independence, Impartial courts, Protection of property rights, Military interference Integrity of the legal system Legal enforcementof contracts, Regulatory costs, Reliability of police
ef_money Money growth, Standard deviation of inflation, Inflation: Most recent year, Freedom to own foreign currency
ef_trade Tariffs, Regulatory trade barriers, Black-market exchange rates, Movement of capital and people
ef_regulation Credit market regulations, Labor market regulations, Business regulations

Disaster list database (D6_0_Disaters)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
continent Continents touched by the disasters such as floods, ouragan
total_deaths Number of deaths caused by disasters
no_injured Number of injured caused by disasters
no_affected Number of affected caused by disasters
no_homeless Number of homeless caused by disasters
total_affected Total number of affected caused by disasters
total_damages Total of infrastructure damages

COVID database (D7_0_COVID)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
deaths_per_million Number of people dead due to COVID
cases_per_million Number of COVID cases
stringency Government Response Stringency Index: composite measure based on 9 response indicators including school closures, workplace closures, and trave

Conflicts database (D8_0_Conflicts)

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
ongoing Variable coded 1 for more than 25 deaths in intrastate conflict and 0 otherwise according to UCDP/PRIO Armed Conflict Dataset 17.1.
sum_deaths Best estimate of deaths in all categories of violence (non-state, one-sided and state-based) recorded by the Uppsala Conflict Data Program in the country based on the UCDP GED dataset (unpublished 2016 data). The location of these events is used for estimating the extent of violence.
pop_affected Share of population affected by violence in percentage (0 to 100) measured as described above based on population data from CIESIN, the PRIO-GRID structure as well as UCDP GED.
area_affected Area affected by conflict
maxintensity Two different intensity levels are coded: minor armed conflicts (1) and wars (2), Takes the max intensity of conflict in the country so that it is coded 2 if there is at least one war (>=1000 deaths in intrastate conflict) during the year. Data from UCDP/PRIO Armed Conflict Dataset 17.1.

2.3 Wrangling/cleaning

To accommodate the large scale of the datasets, we pre-cleaned each one prior to merging. This streamlined the process, simplifying the cleaning of the final, combined dataset. The treatment of missing values wil be taken care of after merging our datasets.

2.3.1 Dataset on SDG

This is our main dataset, that we clean in order to keep the columns containing the following information: country name, country code, year, population, overall score and the SDGs scores.

We start by importing the data and converting it into a DataFrame. Next, we rename the columns and convert the scores into numeric variables.

Code
D1_0_SDG <- read.csv(here("scripts","data","SDG.csv"), sep = ";")
D1_0_SDG <- as.data.frame(D1_0_SDG)

D1_0_SDG <- D1_0_SDG[,1:22]

colnames(D1_0_SDG) <- c("code", "country", "year", "population",
                        "overallscore", "goal1", "goal2", "goal3",
                        "goal4", "goal5", "goal6", "goal7", "goal8",
                        "goal9", "goal10", "goal11", "goal12",
                        "goal13", "goal14", "goal15", "goal16",
                        "goal17")

D1_0_SDG[["overallscore"]] <- as.double(gsub(",", ".", D1_0_SDG[["overallscore"]]))

makenumSDG <- function(D1_0_SDG) {
  for (i in 1:17) {
    varname <- paste("goal", i, sep = "")
    D1_0_SDG[[varname]] <- as.double(gsub(",", ".", D1_0_SDG[[varname]]))
  }
  return(D1_0_SDG)
}

D1_0_SDG <- makenumSDG(D1_0_SDG)

We proceed by examining the missing values.

Code
propmissing <- numeric(length(D1_0_SDG))

for (i in 1:length(D1_0_SDG)){
  proportion <- mean(is.na(D1_0_SDG[[i]]))
  propmissing[i] <- proportion
}
variable_names <- colnames(D1_0_SDG)
 
prop_missing_data <- data.frame(variable = variable_names, prop_missing = propmissing)

ggplot(prop_missing_data, aes(x = variable, y = prop_missing)) +
   geom_bar(stat = "identity", fill = "skyblue", color = "black") +
   labs(title = "NAs by columns in the main dataset",
        x = "Variable",
        y = "Proportion of Missing Values") +
   theme_minimal()+
   coord_flip()

Observing that the ‘population’ column contains numerous NAs, we investigate and discover that missing values are common, as some observations represent regions, not countries. Therefore, we can safely exclude these observations.

Code

SDG0 <- D1_0_SDG %>%
  group_by(code) %>%
  select(population) %>%
  summarize(NaPop = mean(is.na(population))) %>%
  filter(NaPop != 0)

ggplot(SDG0, aes(x = code, y = NaPop)) +
  geom_bar(stat = "identity", fill = "lightgreen", color = "black") +
  labs(title = "NAs by row in population variable are for regions and not countries",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

D1_0_SDG <- D1_0_SDG %>%
  filter(!str_detect(code, "^_"))

Now, there are no missing values in the ‘population’ variable, and we observe that it contains information on 166 countries.

We notice that NAs are present in only three SDG scores: 1, 10, and 14. Additionally, when a country has NAs, they occur across all years or not at all. Consequently, we decide to conduct further investigations on these three SDG scores to determine whether to include them in our analysis.

For goal 1, there are only 9.04% missing values in 15 different countries. Goal 1 being “End poverty”, we decide to keep it and only remove the countries with no information for the analysis.

Code
SDG2 <- D1_0_SDG |> 
  group_by(code) |> 
  select(contains("goal")) |> 
  summarize(Na1 = mean(is.na(goal1))) |>
  filter(Na1 != 0)
country_number <- length(unique(D1_0_SDG$country))
length(unique(SDG2$code))/country_number
#> [1] 0.0904

For goal 10, there are only 10.2% missing values in 17 different countries. Goal 10 being “reduced inequalities”, we decide to keep it and only remove the countries with no information for the analysis.

Code
SDG3 <- D1_0_SDG |> 
  group_by(code) |> 
  select(contains("goal")) |> 
  summarize(Na10 = mean(is.na(goal10))) |>
  filter(Na10 != 0)

length(unique(SDG3$code))/country_number
#> [1] 0.102

For goal 14, there are 24.1% missing values in 40 different countries. Goal 14 being “life under water”, we decide not to keep it, because other SDG such as “life on earth” and “clean water” already treat similar subjects.

Code
SDG4 <- D1_0_SDG |> 
  group_by(code) |> 
  select(contains("goal")) |> 
  summarize(Na14 = mean(is.na(goal14))) |>
  filter(Na14 != 0)

length(unique(SDG4$code))/country_number
#> [1] 0.241

D1_0_SDG <- D1_0_SDG %>% select(-goal14)

We will work with various datasets and merge them using the country code and year as key identifiers. To ensure accurate matching, we first verify that country names are encoded in UTF-8 format. Then, we standardize the names of the countries (requiring a custom match for Turkey) and the country codes, utilizing the countrycode library. Additionally, we compile a list of all country codes from the main database to filter the other datasets. Lastly, we complete the database to include all possible “country, year” combinations, ensuring the total number of rows remains unchanged.

Code
D1_0_SDG$country <- stri_encode(D1_0_SDG$country, to = "UTF-8")

D1_0_SDG <- D1_0_SDG %>%
  mutate(country = countrycode(country, "country.name", "country.name", custom_match = c("T�rkiye"="Turkey")))

D1_0_SDG$code <- countrycode(
  sourcevar = D1_0_SDG$code,
  origin = "iso3c",
  destination = "iso3c",
)

list_country <- c(unique(D1_0_SDG$code))

D1_0_SDG_country_list <- D1_0_SDG %>%
  filter(code %in% list_country) %>%
  select(code, country)

D1_0_SDG_country_list <- D1_0_SDG_country_list %>%
  select(code, country) %>%
  distinct()

Finally, we complete the database to ensure there are no missing pairs of (year, code).

Here are the first few lines of the cleaned dataset on SDG achievement scores:

For this first dataset, we reduced the size from 4,140 observations across 120 variables to 3,818 observations for 21 variables.

As said, this is now our main dataset. All subsequent datasets will be merged with this dataset. Therefore, for all the following datasets, we will make sure that we only keep data for the same countries and years as in this dataset. We have a total of 166 countries and the years range from 2000 to 2022.

2.3.2 Dataset on Unemployment rate

In this dataset, the initial step involves importing the data. Next, we ensure that the names and codes of the countries are formatted in UTF-8, preventing any discrepancies due to mismatches in country names. Following this, we modify the column names and filter the data to include only the relevant countries and years, specifically the years 2000 to 2022, covering 166 countries from our primary dataset.

Code
D2_1_Unemployment_rate <- read.csv(here("scripts","data","UnemploymentRate.csv")) %>%
  as.data.frame() %>%
  mutate(
    country = iconv(ref_area.label, to = "UTF-8", sub = "byte"),
    country = countrycode(country, "country.name", "country.name"),
    year = time,
    `unemployment rate` = obs_value / 100,
    age_category = classif1.label,
    sex = sex.label
  ) %>%
  select(-ref_area.label, -time, -obs_value, -classif1.label, -sex.label, -source.label, -obs_status.label, -indicator.label) %>%
  merge(D1_0_SDG_country_list[, c("country", "code")], by = "country", all.x = TRUE) %>%
  filter(year >= 2000 & year <= 2022,
         !str_detect(sex, fixed("Male")) & !str_detect(sex, fixed("Female")),
         code %in% D1_0_SDG_country_list$code,
         age_category == "Age (Youth, adults): 15+") %>%
  select(code, country, year, `unemployment rate`) %>%
  distinct()

Here are the first few lines of the cleaned dataset on Unemployment rate:

For this first dataset, we reduced the size from 82,800 observations across 8 variables to 3,812 observations for 5 variables.

2.3.3 Dataset on GDP military Expenditures

We have three different databases which contain information on each countries over the years. Each year represent one variable. We want to extract three variables for our analysis: GDP per capita, military expenditures in percentage of the GDP and military expenditures in percentage of government expenditures.

Code
GDPpercapita <-
  read.csv(here("scripts","data","GDPpercapita.csv"), sep = ";")
MilitaryExpenditurePercentGDP <-
  read.csv(here("scripts","data","MilitaryExpenditurePercentGDP.csv"), sep = ";")
MiliratyExpenditurePercentGovExp <-
  read.csv(here("scripts","data","MiliratyExpenditurePercentGovExp.csv"), sep = ";")

After importing the data, we fill in the missing country codes using the column Indicator.Name, because we realized after some manipulations, that some of the country codes were false, but the next column contained the right ones.

Code
fill_code <- function(data){
  data <- data %>%
    mutate(Country.Code = ifelse(!grepl("^[A-Z]{3}$", Country.Code), Indicator.Name, Country.Code))
}

We create a set of functions that we will apply to each database. First, remove the variables that we don’t need, which are the years before 2000. Second, make sure that the values are numeric and rename the year variables (because they all had an “X” before year number). Third, transform the database from wide to long, in order to match the main database. Fourth, transform the year variable into an integer variable and rearrange and rename the columns to match the ones of the other databases. Then, we apply these transformations to the three databases.

Code
remove <- function(data){
  years <- seq(1960, 1999)
  removeyears <- paste("X", years, sep = "")
  data <- data[, !(names(data) %in% c("Indicator.Name", "Indicator.Code", "X", removeyears))]
}

makenum <- function(data) {
  for (i in 2000:2022) {
    year <- paste("X", i, sep = "")
    data[[year]] <- as.numeric(data[[year]])
  }
  return(data)
}

renameyear <- function(data) {
  for (i in 2000:2022) {
    varname <- paste("X", i, sep = "")
    names(data)[names(data) == varname] <- gsub("X", "", varname)
  }
  return(data)
}

wide2long <- function(data) {
  data <- pivot_longer(data, 
                       cols = -c("Country.Name", "Country.Code"), 
                       names_to = "year", 
                       values_to = "data")
  return(data)
}

yearint <- function(data) {
  data$year <- as.integer(data$year)
  return(data)
}

nameorder <- function(data) {
  colnames(data) <- c("country", "code", "year", "data")
  data <- data %>% select(c("code", "country", "year", "data"))
}

cleanwide2long <- function(data){
  data <- fill_code(data)
  data <- remove(data)
  data <- makenum(data)
  data <- renameyear(data)
  data <- wide2long(data)
  data <- yearint(data)
  data <- nameorder(data)
}

GDPpercapita <- cleanwide2long(GDPpercapita)
MilitaryExpenditurePercentGDP <- cleanwide2long(MilitaryExpenditurePercentGDP)
MiliratyExpenditurePercentGovExp <- cleanwide2long(MiliratyExpenditurePercentGovExp)

We rename the colums with the main information, standardize the country code and remove the countries that are not in our main database. We see that all the 166 countries are there.

Code
GDPpercapita <- GDPpercapita %>%
  rename(GDPpercapita = data)
MilitaryExpenditurePercentGDP <- MilitaryExpenditurePercentGDP %>%
  rename(MilitaryExpenditurePercentGDP = data)
MiliratyExpenditurePercentGovExp <- MiliratyExpenditurePercentGovExp %>%
  rename(MiliratyExpenditurePercentGovExp = data)

GDPpercapita$code <- countrycode(
  sourcevar = GDPpercapita$code,
  origin = "iso3c",
  destination = "iso3c",
)

MilitaryExpenditurePercentGDP$code <- countrycode(
  sourcevar = MilitaryExpenditurePercentGDP$code,
  origin = "iso3c",
  destination = "iso3c",
)

MiliratyExpenditurePercentGovExp$code <- countrycode(
  sourcevar = MiliratyExpenditurePercentGovExp$code,
  origin = "iso3c",
  destination = "iso3c",
)

GDPpercapita <- GDPpercapita %>% filter(code %in% list_country)
length(unique(GDPpercapita$code))
#> [1] 166

MilitaryExpenditurePercentGDP <- MilitaryExpenditurePercentGDP %>% filter(code %in% list_country)
length(unique(MilitaryExpenditurePercentGDP$code))
#> [1] 166

MiliratyExpenditurePercentGovExp <- MiliratyExpenditurePercentGovExp %>% filter(code %in% list_country)
length(unique(MiliratyExpenditurePercentGovExp$code))
#> [1] 166

There were only 157 countries that were both in the main SDG dataset and in these 3 datasets, but we suspected that some of the missing countries were in the database but not rightly matched. Indeed, Bahamas was in the database but instead of the code “BHS” there was “The”, for “COD” it was “Dem. Rep.”, for “COG” it was “Rep”, etc. We remarked that the code is in another column of the initial database: “Indicator.Name”. We went back to the initial database and before cleaning it we put the right codes (as seen above) and after rerunning the code we see that we have all our 166 countries from the initial dataset.

Code
list_country_GDP <- c(unique(GDPpercapita$code))
setdiff(list_country, list_country_GDP)
#> character(0)
Code
D3_1_GDP_per_capita <- GDPpercapita
D3_2_Military_Expenditure_Percent_GDP <- MilitaryExpenditurePercentGDP
D3_3_Miliraty_Expenditure_Percent_Gov_Exp <- MiliratyExpenditurePercentGovExp

Here are the first few lines of the cleaned dataset of GDP per capita:

For this dataset, we went from ??? observations for 68 variables to 3818 observations for 4 varibles.

Here are the first few lines of the cleaned dataset of military expenditures in percentage of GDP:

For this dataset, we went from ??? observations for 68 variables to 3818 observations for 4 varibles.

Here are the first few lines of the cleaned dataset of military expenditures in percentage of government expenditures:

2.3.4 Dataset on internet usage

To prepare the dataset on internet usage in the world to be merge with the other data, we first, import the data. Then, we keep only the year that we are interested in (2000 to 2022). We also rename the column and keep only the country that match the list of the countries in the main dataset on the SDG.

Code
D4_0_Internet_usage <- read.csv(here("scripts", "data", "InternetUsage.csv")) %>%
  filter(Year >= 2000, Year <= 2022) %>%
  rename(
    code = Code,
    country = Entity,
    year = Year,
    internet_usage = Individuals.using.the.Internet....of.population.
  ) %>%
  mutate(internet_usage = internet_usage / 100) %>%
  filter(code %in% list_country) %>%
  select(code, country, year, internet_usage)

Here are the first few lines of the cleaned dataset of internet usage:

For this first dataset, we reduced the size from 6,570 observations across 4 variables to 3,433 observations for 4 variables.

2.3.5 Dataset on human freedom index

After importing the data from the CATO Institute website, we noticed that even if the file was called “Human Freedom Index 2022”, the available observations were only going from 2000 up to 2020. We have decided first to modify it in order to match our other datasets, by renaming/encoding/standardizing the columns containing the country names.

Code
data <- read.csv(here("scripts", "data", "human-freedom-index-2022.csv"))

#data in tibble 
datatibble <- tibble(data)

# Rename the column countries into country to match the other datbases
names(datatibble)[names(datatibble) == "countries"] <- "country"

# Make sure the encoding of the country names are UTF-8
datatibble$country <- iconv(datatibble$country, to = "UTF-8", sub = "byte")

# standardize country names
datatibble <- datatibble %>%
  mutate(country = countrycode(country, "country.name", "country.name"))

Once done, we could verify which countries were or were not present between these observations and our main SDG dataset. We have decided to keep the ones that were matching between the two datasets.

Code
# Merge by country name
datatibble <- datatibble %>%
  left_join(D1_0_SDG_country_list, by = "country")

datatibble <- datatibble %>% filter(code %in% list_country)
(length(unique(datatibble$code)))
#> [1] 159

# See which ones are missing
list_country_free <- c(unique(datatibble$code))
setdiff(list_country, list_country_free)
#> [1] "AFG" "CUB" "MDV" "STP" "SSD" "TKM" "UZB"

# Turkey was missing but present in the initial database (it was a problem when stadardizing the country names of D1_0SDG_country_list that we corrected) and the other missing countries are:"AFG" "CUB" "MDV" "STP" "SSD" "TKM" "UZB" 
D5_0_Human_freedom_index <- datatibble

Then, we noticed that there were a lot of columns that were not important for us, as we had 141 variables taken into account. So we have decided to keep the ones that refers to the countries informations (such as code, year, ..) and their human freedom scores per category (pf for personnal freedom, ef for economical freedom).

Code
# erasing useless columns to keep only the general ones. 
D5_0_Human_freedom_index <- select(D5_0_Human_freedom_index, year, country, region, hf_score, pf_rol, pf_ss, pf_movement, pf_religion, pf_assembly, pf_expression, pf_identity, pf_score, ef_government, ef_legal, ef_money, ef_trade, ef_regulation, ef_score, code)

D5_0_Human_freedom_index <- D5_0_Human_freedom_index %>%
  rename(
    pf_law = names(D5_0_Human_freedom_index)[5],      # Renames the 5th column to "pf_law"
    pf_security = names(D5_0_Human_freedom_index)[6]  # Renames the 6th column to "pf_security"
  )

Here are the first few lines of the partialy cleaned dataset on Human Freedom Index scores:

For this first dataset, we reduced the size from 3’465 observations across 141 variables to 3339 observations for 4 variables.

2.3.6 Dataset on Disasters

For this dataset concerning the Disasters we imported the data from Kaggle as we couldn’t find the original dataset that is private coming from the EOSDIS SYSTEM, an interactive interface for browsing full-resolution, global, daily satellite images from NASA. Once we made sure that our file called “Disasters” was convert into a data frame, we selected some specific columns that we where interested in.

Code
Disasters <- as.data.frame(read.csv(here("scripts", "data", "Disasters.csv"))) %>%
  select(Year, Country, ISO, Location, Continent, Disaster.Subgroup, Disaster.Type, Total.Deaths, No.Injured, No.Affected, No.Homeless, Total.Affected, Total.Damages...000.US..)

Because we knew that our file showed all the disasters in each country over the years (between 1970-2021) and that we wanted to focus on a specific period, we filtered our data to show the years between 2000 and 2022. Then we rearranged our data, changing the data types of all the columns and their names in order to match our other datasets.

Code
# Rearrange the columns, changed the type of data, renamed the columns
Rearanged_Disasters <- Disasters %>%
  filter(Year >= 2000 & Year <= 2022) %>%
  mutate(
    code = as.character(ISO),
    country = as.character(Country),
    year = as.integer(Year),
    continent = as.character(Continent),
    disaster.subgroup = as.character(Disaster.Subgroup),
    disaster.type = as.character(Disaster.Type),
    location = as.character(Location),
    total.deaths = as.numeric(Total.Deaths),
    no.injured = as.numeric(No.Injured),
    no.affected = as.numeric(No.Affected),
    no.homeless = as.numeric(No.Homeless),
    total.affected = as.numeric(Total.Affected),
    total.damages = as.numeric(Total.Damages...000.US..)
  )

We then grouped the data by “year”, “code”, “country” and “continent” and summarize the data. Here you can see that we re-selected specific columns as we saw that our first pre-selection was still too wide and some variables as the disaster.subgroup and disaster.type weren’t pertinent.We arranged the columns based on “code,” “country,” “year,” and “continent” to match the other datasets.

Code
Disasters <- Rearanged_Disasters %>%
  group_by(year,code, country, continent) %>%
  summarize(
    total_deaths = sum(total.deaths, na.rm = TRUE),
    no_injured = sum(no.injured, na.rm = TRUE),
    no_affected = sum(no.affected, na.rm = TRUE),
    no_homeless = sum(no.homeless, na.rm = TRUE),
    total_affected = sum(total.affected, na.rm = TRUE),
    total_damages = sum(total.damages, na.rm = TRUE)
  ) 

D6_0_Disasters <- Disasters %>%
  select(code, country, year, continent, total_deaths, no_injured, no_affected, no_homeless, total_affected, total_damages) %>%
  arrange(code, country, year, continent)

Finally we filtered our disasters data to keep only the countries that are present in our main dataset. We analysed the missing countries and identified three countries (BHR, BRN, MLT) that are unexpectedly missing.

Code
D6_0_Disasters <- D6_0_Disasters %>% filter(code %in% list_country)
length(unique(D6_0_Disasters$code))
#> [1] 163

# Here we see which countries are missing
list_country_disasters <- c(unique(D6_0_Disasters$code))
setdiff(list_country, list_country_disasters)
#> [1] "BHR" "BRN" "MLT"

Here are the first few lines of the cleaned dataset on Disasters:

2.3.7 Dataset on COVID

This dataset contains information on the COVID19 pandemic between 2020 and 2022. The observation are by year, month, day. After importing the database, we transform the date in format YYYY-MM-DD in order to only keep the year.

Code
COVID <- read.csv(here("scripts", "data", "COVID.csv")) %>%
  select(iso_code, location, date, new_cases_per_million, new_deaths_per_million, stringency_index) %>%
  mutate(date = as.integer(year(date)))

We perform a first round of investigation of the missing values before aggregating the values by year. We begin with the variables “cases per million” and “deaths per million”: seeing that for each country, we have either only missing values, either a very low percentage of missing values (~1%), we can compute the sum over each year and ignore the missing values without altering the data. Indeed, where all the values are missing, the computation will return a NA. We then look at the “stringency” variable and we have 3 scenarios:

  1. ~20% of missing values: we ignore missing values when computing the mean to have an idea of stringency each year (because we compute the mean stringency over the year, if some days are missing, it is not a problem, it can not evoluate that fast).

  2. all are missing: we can ignore the missing values when computing the mean, because it will still return a missing value

  3. almost all are missing: here the mean doesn’t make sense -> we will replace the values by NAs to be coherent. The countries with this issues are: ERI, GUM, PRI and VIR. We verify if they are in our main dataset and since none of these countries are, we can ignore the issue, the lines will be remove later anyway.

We aggregate the observations of all days of a year in one observation per country using the mean.

Code
COVID1 <- COVID %>%
  group_by(iso_code) %>%
  summarize(NaDeaths = round(mean(is.na(new_deaths_per_million)),3),
            NaCases = round(mean(is.na(new_cases_per_million)), 3),
            NaStringency = round(mean(is.na(stringency_index)), 3)) %>%
  pivot_longer(cols = starts_with("Na"), names_to = "Variable", values_to = "NaValue")%>%
  filter(NaValue!=0)

issue_list <- c("ERI", "GUM", "PRI", "VIR")
is.element(issue_list, list_country)
#> [1] FALSE FALSE FALSE FALSE

COVID <- COVID %>%
  group_by(location, date) %>%
  mutate(
    cases_per_million = sum(new_cases_per_million, na.rm = TRUE),
    deaths_per_million = sum(new_deaths_per_million, na.rm = TRUE),
    stringency = mean(stringency_index, na.rm = TRUE)
  )%>%
  ungroup()

Now that all the variables of interest are aggregated by year, we remove all the variables that we don’t need and rename all the remaining variables to match the main dataset.

Code
COVID <- COVID %>%
  group_by(location, date) %>%
  distinct(date, .keep_all = TRUE) %>%
  ungroup()

COVID <- COVID %>% select(-c(new_cases_per_million, new_deaths_per_million, stringency_index))

colnames(COVID) <- c("code", "country", "year", "cases_per_million", "deaths_per_million", "stringency")

We remove the years that exceed 2022, we make sure that the country codes are all iso codes with 3 letters (we observe that sometimes they are preceded by “OWID_”) and we standardize the country codes.

Code
COVID <- COVID[COVID$year <= 2022, ]

COVID$code <- gsub("OWID_", "", COVID$code)

COVID$code <- countrycode(
  sourcevar = COVID$code,
  origin = "iso3c",
  destination = "iso3c"
)

We remove the observations of countries that aren’t in our main dataset on SDGs and find that all the 166 countries that we have in the main SDG dataset are also in this one.

Code
D7_0_COVID <- COVID %>% filter(code %in% list_country)
length(unique(COVID$code))
#> [1] 238

Here are the first few lines of the cleaned dataset on COVID19:

2.3.8 Dataset on Conflicts

For our conflicts dataset, we imported the data from “The World Banck” data catalog. Once we made sure that our file called “Disasters” was convert into a data frame, we selected some specific columns that we where interested in.

Code
Conflicts <- read.csv(here("scripts", "data", "Conflicts.csv")) %>%
  as.data.frame() %>%
  select(year, country, ongoing, gwsum_bestdeaths, pop_affected, 
         peaceyearshigh, area_affected, maxintensity, maxcumulativeintensity)

Our file showed all the Conflicts and consequences per country over the years (between 2000-2016). We couldn’t find a better and more complete dataset, As we consider conflicts as events, we will only take into account results between 2000 and 2016. Then we rearranged our data, changing the data types of all the columns and their names in order to match our other datasets. We grouped the data by ” year”, “country”, re-selected some variables and summarize the data.

Code
Rearanged_Conflicts <- Conflicts %>%
  filter(year >= 2000 & year <= 2022)%>%
  mutate(
    ongoing = as.integer(ongoing),
    country = as.character(country),
    year = as.integer(year),
    gwsum_bestdeaths = as.numeric(gwsum_bestdeaths),
    pop_affected = as.numeric(pop_affected),
    area_affected = as.numeric(area_affected),
    maxintensity = as.numeric(maxintensity),
    )

# Group the data by "year", "country" and summarize the data
Conflicts <- Rearanged_Conflicts %>%
  group_by(year, country) %>%
  summarize(
    ongoing = sum (ongoing, na.rm = TRUE),
    sum_deaths = sum(gwsum_bestdeaths, na.rm = TRUE),
    pop_affected = sum(pop_affected, na.rm = TRUE),
    area_affected = sum(area_affected, na.rm = TRUE),
    maxintensity = sum(maxintensity, na.rm = TRUE),
  )

After we Selected specific columns from the summarized data and arrange the data by our specified columns. To make our dataset compatible with the main one and let the merging face succeed, we dd some adjustment concerning the country names’ to ensure the compatibility. Then we standardize and merge by country names to finally rearrange the data to retain only the countries present in our main dataset. Note that in the end we can see that only one country is missing that wasn’t in the initial conflicts database: BLR

Code
conflicts <- Conflicts %>%
  select(country, year, ongoing, sum_deaths, pop_affected, area_affected, maxintensity) %>%
  arrange(country, year)

conflicts$country <- iconv(conflicts$country, to = "UTF-8", sub = "byte")

conflicts <- conflicts %>%
  mutate(country = countrycode(country, "country.name", "country.name"))

conflicts <- conflicts %>%
  left_join(D1_0_SDG_country_list, by = "country")

conflicts <- conflicts %>%
  select(code, country, year, ongoing, sum_deaths, pop_affected, area_affected, maxintensity) %>%
  arrange(code, country, year)


D8_0_Conflicts <- conflicts %>% filter(code %in% list_country)
(length(unique(conflicts$code)))
#> [1] 166

# See which countries are missing
list_country_conflicts <- c(unique(conflicts$code))
setdiff(list_country, list_country_conflicts)
#> [1] "BLR"

Here are the first few lines of the cleaned dataset on Conflicts:

2.3.9 Merging our dataset

By merging our eight pre-cleaned datasets, we create a final database.

Code
D2_1_Unemployment_rate$country <- NULL
merge_1_2 <- D1_0_SDG |> left_join(D2_1_Unemployment_rate, join_by(code, year))

D3_1_GDP_per_capita$country <- NULL
merge_12_3 <- merge_1_2 |> left_join(D3_1_GDP_per_capita, join_by(code, year))

D3_2_Military_Expenditure_Percent_GDP$country <- NULL
merge_12_3 <- merge_12_3 |> left_join(D3_2_Military_Expenditure_Percent_GDP, join_by(code, year)) 

D3_3_Miliraty_Expenditure_Percent_Gov_Exp$country <- NULL
merge_12_3 <- merge_12_3 |> left_join(D3_3_Miliraty_Expenditure_Percent_Gov_Exp, join_by(code, year)) 

D4_0_Internet_usage$country <- NULL
merge_123_4 <- merge_12_3 |> left_join(D4_0_Internet_usage, join_by(code, year)) 

D5_0_Human_freedom_index$country <- NULL
merge_1234_5 <- merge_123_4 |> left_join(D5_0_Human_freedom_index, join_by(code, year)) 

D6_0_Disasters$country <- NULL
merge_12345_6 <- merge_1234_5 |> left_join(D6_0_Disasters, join_by(code, year)) 

D7_0_COVID$country <- NULL
D7_0_COVID <- D7_0_COVID |> distinct(code, year, .keep_all = TRUE)
merge_123456_7 <- merge_12345_6 |> left_join(D7_0_COVID, join_by(code, year)) 

D8_0_Conflicts$country <- NULL
all_Merge <- merge_123456_7 |> left_join(D8_0_Conflicts, join_by(code, year)) 

2.3.10 Cleaning of the final database

2.3.10.1 FILLING COLOMNS

When we merged our dataset, we noticed that some countries were not assigned their corresponding continents and/or region. This issue arose because we sourced the continent and region data from secondary databases, not from our main one. We now add this the corresponding missing continents and regions.

Code
#### Filling missing continents and regions ####

# Update all_Merge with region and continent information
all_Merge <- all_Merge %>%
  group_by(country) %>%
  mutate(
    continent = ifelse(is.na(continent), first(na.omit(continent)), continent),
    region = ifelse(is.na(region), first(na.omit(region)), region)
    ) %>%
  ungroup() %>%
  mutate(continent = case_when(
    code %in% c("BHR") ~ "Asia",
    code %in% c("BRN") ~ "Asia",
    code %in% c("MLT") ~ "Europe",
      TRUE ~ continent
    ))

# Load Disasters dataset to add region information
Disasters <- read.csv(here("scripts", "data", "Disasters.csv")) %>%
  select(ISO, Region) %>%
  distinct(ISO, Region, .keep_all = TRUE) %>%
  rename(code = ISO, region = Region)
# Merge All_Merge with Disasters dataset
all_Merge <- left_join(all_Merge, Disasters, by = "code") %>%
  mutate(region = ifelse(is.na(region.x), region.y, region.x)) %>%
  select(-region.x, -region.y)

We order the database, beginning by the information on the country, the year, the continent and the region.

Code
all_Merge <- as.data.frame(all_Merge) %>%
  select(code, year, country, continent, region, everything())

write.csv(all_Merge, file = here("scripts","data","all_Merge.csv"))

Here are the first few lines of the final dataset:

Final structure of our merged database: each country of the 166 countries from D1_1_SDG are observed each year from 2000 to 2022, thus each row has a key composed of (code, year) that uniquely identifies an observation. The other columns are the variables listed above. Due to some countries having a lot of missing information we will have to eliminate some of them, but we will still have more than 2000 rows in our database.

2.3.11 Treatment of missing values

We load our final database and we visualize the missing values.

Code
all_Merge <- read.csv(here("scripts","data","all_Merge.csv"))

# Remove unnecessary column
all_Merge <- all_Merge %>% select(-c(X))

# Create a dataframe with the goals without NAs summarize in one column to simplify the visualization
goal_vars <- all_Merge %>%
  select(starts_with("goal")) %>%
  filter_all(all_vars(!is.na(.))) %>%
  colnames()
to_plot_missing <- all_Merge %>%
  mutate(Goals_without_NAs = rowSums(!is.na(select(., all_of(goal_vars))))) %>%
  select(-c(goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal11, goal12, goal13, goal15, goal16, goal17))

vis_dat(to_plot_missing, warn_large_data = FALSE) + scale_fill_brewer(palette = "Paired") +
  theme(
    axis.text.x = element_text(angle = 90, size = 6),
    legend.text = element_text(size = 8),  # Adjust the size of legend text
    legend.title = element_text(size = 10) 
  )

For each of our research question, we will start with the merged data set and deal with the missing value separately. This will allow us to not delete observations when we do not need to.

For question 1, we only keep the years until 2020, because most of the explanatory variables that we want to use (those coming from the human freedom index) only have values until 2020.

Code
data_question1 <- all_Merge %>%
  filter(year<=2020) %>%
  select(-c(total_deaths, no_injured, no_affected, no_homeless, total_affected, total_damages, cases_per_million, deaths_per_million, stringency, ongoing, sum_deaths, pop_affected, area_affected, maxintensity))

For question 2 and 4, we use the main data from the SDG database.

Code
data_question24 <- all_Merge %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal17))

For question 3, we create 3 distinct databases according to the different type of event that we wwill analyse: disasters, COVID19 and conflicts. For the disasters, we only keep the years until 2021, because after this date, we don’t have data. For the conflicts, we only keep the years until 2016, because after this date, we don’t have data.

Code
# Disasters
data_question3_1 <- all_Merge %>%
  filter(year<=2021) %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal7, total_deaths, no_injured, no_affected, no_homeless, total_affected, total_damages))

# COVID
data_question3_2 <- all_Merge %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal7, cases_per_million, deaths_per_million, stringency))

# Conflicts 
data_question3_3 <- all_Merge %>%
  filter(year<=2016) %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal7, ongoing, sum_deaths, pop_affected, area_affected, maxintensity))

2.3.11.1 Data for question 1

2.3.11.1.1 Dealing with missing values in colomns

We begin by visualizing the missing values. To have a less messy graph we group all the goals wihtout NAs into one single variable. We decide to remove MilitaryExpenditurePercentGovExp, because it has too many missing values and it contains similar information to MilitaryExpenditurePercentGDP.We also remove hf_score, pf_score and ef_score, because there are many missing values and since these variables summarize the other ones, deleting them will not make us loose information.

Code
# Create a dataframe with the goals without NAs summarize in one column to simplify the visualization
variable_names <- names(data_question1)
missing_percentages <- sapply(data_question1, function(col) mean(is.na(col)) * 100)

missing_data_summary <- data.frame(
  Variable = variable_names,
  Missing_Percentage = missing_percentages
)

missing_data_summary <- missing_data_summary %>%
  mutate(VariableGroup = ifelse(startsWith(Variable, "goal") & Missing_Percentage == 0, "Goals without NAs", as.character(Variable)))

ggplot(data = missing_data_summary, aes(x = reorder(VariableGroup, Missing_Percentage), y = Missing_Percentage, fill = Missing_Percentage)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = ifelse(Missing_Percentage > 1, sprintf("%.1f%%", Missing_Percentage), ""),
                y = Missing_Percentage),
            position = position_stack(vjust = 1),  # Adjust vertical position
            color = "white",  # Text color
            size = 2,          # Text size
            hjust = 1.05) +
  labs(title = "Percentage of Missing Values by Variable",
       x = "Variable",
       y = "Missing Percentage") +
  theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1, size=6 ),
        legend.text = element_text(size = 8),
        legend.title = element_text(size = 10)) +
  labs(fill = "% NAs") +
  coord_flip()

data_question1 <- data_question1 %>% select(-c(MiliratyExpenditurePercentGovExp, hf_score, pf_score, ef_score))

2.3.11.1.2 Dealing with missing vlaues in rows

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. We decide to remove the countries that have more than 50 missing values.

Code
see_missing1_1 <- data_question1 %>%
  group_by(code) %>%
  summarise(across(-c(year, country, continent, region, population, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal17), 
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 50))

data_question1 <- data_question1 %>% filter(!code %in% see_missing1_1$code)

Here is the graph that allows us to visualize the countries that have missing values, how many and for which variables, when there are more than 50 NAs in total.

Code
ggplot(see_missing1_1, aes(x = num_missing , y = reorder(code, num_missing), fill = num_missing)) +
    geom_bar(stat = "identity") + 
    scale_fill_gradient(low = "lightgreen", high = "darkgreen") +
    theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1, size=8 ),
        legend.text = element_text(size = 8),
        legend.title = element_text(size = 10)) +
    labs(title = "Number of missing values per country containing at least 50 NAs", x = "Number of Missing Values", y = "Countries")

Code
see_missing1_2 <- data_question1 %>%
  group_by(code) %>%
  summarise(across(-c(year, country, continent, region, population, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal17),
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))

Here is the ggplot that helps us to visualize the countries that have missing values after removing the countries with more than 50 NAs.

Code
ggplot(see_missing1_2, aes(x = num_missing , y = reorder(code, num_missing), fill = num_missing)) +
    geom_bar(stat = "identity", width = 0.5) + 
    scale_fill_gradient(low = "lightgreen", high = "darkgreen") +
    theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1, size= 6 ),
        legend.text = element_text(size = 8),
        legend.title = element_text(size = 10)) +
        labs(title = "Number of missing values per country", x = "Number of Missing Values", y = "Countries")

We also look at patterns of missing values in the rows and see that except for the two goals with NAs that we discussed earlier and for the triplet “ef_money”, “ef_trade” and “ef_regulation” there are not well defined patterns. We removes the countries that have NAs in the three variables mentioned at the same time.

Code
naniar::gg_miss_upset(data_question1, nsets=10, nintersects=11)

data_question1 <- data_question1[rowSums(is.na(data_question1[, c("ef_money", "ef_trade", "ef_regulation")])) < 3, ]

data_question1 <- data_question1 %>%
  group_by(code) %>%
  filter(all(2000:2020 %in% year)) %>%
  ungroup()

2.3.11.1.3 GDP per capita

Only Venezuela has missing values that we can not fill (because the evolution over time is not linear), so we delete the country.

Code
question1_missing_GDP <- data_question1 %>%
  group_by(code) %>%
  summarize(NaGDPpercapita = mean(is.na(GDPpercapita)))%>%
  filter(NaGDPpercapita != 0)

data_question1 <- data_question1 %>% filter(code!="VEN")
2.3.11.1.4 Military expenditure in % of GDP

For MilitaryExpenditurePercentGDP, We plot the evolution of MilitaryExpenditurePercentGDP along the years for each country containing missing values and distinguish the percentage of missing values with colors.

Code
MilitaryExpenditurePercentGDP1 <- data_question1 %>%
  group_by(code) %>%
  summarize(NaMil1 = round(mean(is.na(MilitaryExpenditurePercentGDP)),3)) %>%
  filter(NaMil1 != 0)

filtered_data_Mil1 <- MilitaryExpenditurePercentGDP %>%
  filter(code %in% MilitaryExpenditurePercentGDP1$code) # countries with NAs

filtered_data_Mil1 <- filtered_data_Mil1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(MilitaryExpenditurePercentGDP))) %>% # Column % NAs
  ungroup()

Evol_Missing_Mil1 <- ggplot(data = filtered_data_Mil1) +
  geom_line(aes(x = year, y = MilitaryExpenditurePercentGDP, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Military expenditure in % of GDP over time", x = "Years from 2000 to 2022", y = "GDP per capita") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = 5) +
  theme(strip.text = element_text(size = 6)) +
  scale_x_continuous(breaks = NULL) +
  scale_y_continuous(breaks = NULL)

print(Evol_Missing_Mil1)

We delete the countries with more than 30% of missing values and for the countries with less than 30% of missing values and a linear evolution in time, we fill the missing values using linear interpolation.

Code
data_question1 <- data_question1 %>% filter(code!="ARE" & code!="BHS" & code!="BRB" & code!="CRI" & code!="HTI" & code!="ISL" & code!="PAN" & code!="SYR" & code!="VNM") 

list_code <- c("BDI", "BEN", "CAF", "CIV", "COD", "GAB", "NER", "TGO", "TTO", "ZMB")

for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$MilitaryExpenditurePercentGDP)
  data_question1[data_question1$code == i, "MilitaryExpenditurePercentGDP"] <- interpolated_data
}

Then, we look at the distribution of the variable per region. Seeing that all are skewed distributions, we decide to replace the remaining missing values, where there are less than 30% missing using the median by region.

Code
question1_missing_Military <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(MilitaryExpenditurePercentGDP))) %>% # Column % NAs
  ungroup() %>%
  group_by(region) %>%
  filter(sum(PercentageMissing, na.rm = TRUE) > 0)

Freq_Missing_Military <- ggplot(data = question1_missing_Military) +
  geom_histogram(aes(x = MilitaryExpenditurePercentGDP, 
                     fill = cut(PercentageMissing,
                                breaks = c(0, 0.1, 0.2, 0.3, 1),
                                labels = c("0-10%", "10-20%", "20-30%", "30-100%"))),
                 bins = 30) +
  labs(title = "Distribution of Military expenditures in % of GDP", x = "Military expenditures in % of GDP", y = "Frequency") +
  scale_fill_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%"="red","30-100%" = "black"), labels = c("0-10%", "10-20%", "20-30%","30-100%")) +
  guides(fill = guide_legend(title = "% missings")) +
  facet_wrap(~ region, nrow = 1)

print(Freq_Missing_Military)

data_question1 <- data_question1 %>%
  group_by(code) %>%
  mutate(
    PercentageMissingByCode = mean(is.na(MilitaryExpenditurePercentGDP))
  ) %>%
  ungroup() %>%  
  group_by(region) %>%
  mutate(
    MedianByRegion = median(MilitaryExpenditurePercentGDP, na.rm = TRUE),
    MilitaryExpenditurePercentGDP = ifelse(
      PercentageMissingByCode < 0.3 & !is.na(MilitaryExpenditurePercentGDP),
      MilitaryExpenditurePercentGDP,
      ifelse(PercentageMissingByCode < 0.3, MedianByRegion, MilitaryExpenditurePercentGDP)
    )
  ) %>%
  select(-PercentageMissingByCode, -MedianByRegion)

2.3.11.1.5 Internet usage

There are only low percentage of missing values.

Code
question1_missing_Internet <- data_question1 %>%
  group_by(code) %>%
  summarize(NaInternet = mean(is.na(internet_usage)))%>%
  filter(NaInternet != 0)

There are never more than 30% of NAs. We look at the evolution of the variable over time. We fill the missing values with linear interpolation, because all are increasing in time and they are almost straight lines, except for CIV that we delete.

Code
question1_missing_Internet <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(internet_usage))) %>% # Column % NAs
  filter(code %in% question1_missing_Internet$code)

Evol_Missing_Internet <- ggplot(data = question1_missing_Internet) +
  geom_line(aes(x = year, y = internet_usage, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Evolution of internet usage over time", x = "Years from 2000 to 2022", y = "Internet usage") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  scale_x_continuous(breaks=NULL)+
  facet_wrap(~ code, nrow = 4)

print(Evol_Missing_Internet)

list_code <- setdiff(unique(question1_missing_Internet$code), "CIV")
for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$internet_usage)
  data_question1[data_question1$code == i, "internet_usage"] <- interpolated_data
}

data_question1 <- data_question1 %>% filter(code!="CIV")

2.3.11.1.6 Human freedom index
2.3.11.1.6.1 Personal freedom: law

The variable pf_law has (many) NAs, but only for one country: BLZ, so we decide to remove it.

Code
data_question1 <- data_question1 %>%
  filter(code!="BLZ")
2.3.11.1.6.2 Economic freedom: government

There are no more missing values, thanks to the previous steps.

2.3.11.1.6.3 Economic freedom: money

5 countries have missing values, but the percentage of missing values is always below 25%.

Code
question1_missing_ef_money <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_ef_money = mean(is.na(ef_money)))%>%
  filter(Na_ef_money != 0)

We look at the evolution of the variable over time, and for the countries with a linear evolution in time, we fill the missing values using linear interpolation.

Code
question1_missing_ef_money <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_money))) %>% # Column % NAs
  filter(code %in% question1_missing_ef_money$code)

Evol_Missing_ef_money <- ggplot(data = question1_missing_ef_money) +
  geom_line(aes(x = year, y = ef_money, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Evolution of economic freedom: money over time", x = "Years from 2000 to 2022", y = "ef_money") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = 2) +
  scale_y_continuous(limits = c(0, 10))

print(Evol_Missing_ef_money)

list_code <- c("GEO", "MKD")
for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$ef_money)
  data_question1[data_question1$code == i, "ef_money"] <- interpolated_data
}

Then, we look at the distribution of the variable per region. Seeing that all are skewed distributions, we decide to replace the missing values using the median by region.

Code
question1_missing_ef_money <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_money))) %>% # Column % NAs
  ungroup() %>%
  group_by(region) %>%
  filter(sum(PercentageMissing, na.rm = TRUE) > 0)

Freq_Missing_ef_money <- ggplot(data = question1_missing_ef_money) +
  geom_histogram(aes(x = ef_money, 
                     fill = cut(PercentageMissing,
                                breaks = c(0, 0.1, 0.2, 0.3, 1),
                                labels = c("0-10%", "10-20%", "20-30%", "30-100%"))),
                 bins = 30) +
  labs(title = "Distribution of economic freedom: money", x = "ef_money", y = "Frequency") +
  scale_fill_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%"="red","30-100%" = "black"), labels = c("0-10%", "10-20%", "20-30%","30-100%")) +
  guides(fill = guide_legend(title = "% missings")) +
  facet_wrap(~ region, nrow = 1)

print(Freq_Missing_ef_money)

data_question1 <- data_question1 %>%
  group_by(code) %>%
  mutate(
    PercentageMissingByCode = mean(is.na(ef_money))
  ) %>%
  ungroup() %>% 
  group_by(region) %>%
  mutate(
    MedianByRegion = median(ef_money, na.rm = TRUE),
    ef_money = ifelse(
      PercentageMissingByCode < 0.3 & !is.na(ef_money),
      ef_money,
      ifelse(PercentageMissingByCode < 0.3, MedianByRegion, ef_money)
    )
  ) %>%
  select(-PercentageMissingByCode, -MedianByRegion)

2.3.11.1.6.4 Economic freedom: trade

6 countries have missing values, but the percentage of missing values is always below 25%.

Code
question1_missing_ef_trade <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_ef_trade = mean(is.na(ef_trade)))%>% # Column % NAs
  filter(Na_ef_trade != 0)

We look at the evolution of the variable over time. For the countries where this evolution is linear, we fill in the missing values using linear interpolation.

Code
question1_missing_ef_trade <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_trade))) %>% # Column % NAs
  filter(code %in% question1_missing_ef_trade$code)

Evol_Missing_ef_trade <- ggplot(data = question1_missing_ef_trade) +
  geom_line(aes(x = year, y = ef_trade, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Evolution of economic freedom: trade over time", x = "Years from 2000 to 2022", y = "ef_trade") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = ) +
  scale_y_continuous(limits = c(0, 10))

print(Evol_Missing_ef_trade)

# Linear interpolation for "AZE", "BFA", "ETH", "GEO", "VNH"
list_code <- c("AZE", "GEO", "MKD", "MNG")
for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$ef_trade)
  data_question1[data_question1$code == i, "ef_trade"] <- interpolated_data
}

Then, we look at the distribution of the variable per region. Seeing that the only region that still has missing values is a centered distribution, we decide to replace the missing values using the mean of the region.

Code
question1_missing_ef_trade <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_trade))) %>% # Column % NAs
  ungroup() %>%
  group_by(region) %>%
  filter(sum(PercentageMissing, na.rm = TRUE) > 0)

Freq_Missing_ef_trade <- ggplot(data = question1_missing_ef_trade) +
  geom_histogram(aes(x = ef_trade, 
                     fill = cut(PercentageMissing,
                                breaks = c(0, 0.1, 0.2, 0.3, 1),
                                labels = c("0-10%", "10-20%", "20-30%", "30-100%"))),
                 bins = 30) +
  labs(title = "Distribution of economic freedom: trade", x = "ef_trade", y = "Frequency") +
  scale_fill_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%"="red","30-100%" = "black"), labels = c("0-10%", "10-20%", "20-30%","30-100%")) +
  guides(fill = guide_legend(title = "% missings")) +
  facet_wrap(~ region, nrow = 2)

print(Freq_Missing_ef_trade)

data_question1 <- data_question1 %>%
  group_by(code) %>%
  mutate(
    PercentageMissingByCode = mean(is.na(ef_trade))
  ) %>%
  ungroup() %>% 
  group_by(region) %>%
  mutate(
    MeanByRegion = mean(ef_trade, na.rm = TRUE),
    ef_trade = ifelse(
      PercentageMissingByCode < 0.3 & !is.na(ef_trade),
      ef_trade,
      ifelse(PercentageMissingByCode < 0.3, MeanByRegion, ef_trade)
    )
  ) %>%
  select(-PercentageMissingByCode, -MeanByRegion)

2.3.11.1.6.5 Economic freedom: regulation

There are no more missing values, thanks to the previous steps.

2.3.11.1.7 **SDGs 1 and 10*+

We noticed earlier that there were only missing values for goals 1 and 10. As we did before, we have started to investigate where are located the NAs in our dataset for first goal1, then goal 10.

Code
na_count <- sapply(data_question1, function(x) sum(is.na(x)))
na_count_df <- data.frame(variable = names(na_count), num_missing = na_count)
na_count_df_filtered <- subset(na_count_df, num_missing > 0)
ggplot(na_count_df_filtered, aes(x= num_missing,y=variable, fill = num_missing)) +
    geom_bar(aes(fill = num_missing), stat = "identity", width = 0.8, fill = 'lightblue') +
    geom_text(aes(label = num_missing), vjust = 0.5,hjust = 1.1, position = position_dodge(width = 0.9)) +
    theme_minimal() +
    theme(axis.text.y = element_text(hjust = 1, size=10 ), 
          legend.text = element_text(size = 8),
          legend.title = element_text(size = 10)) +
    labs(title = "Number of remaining missing values per variable ",
         x = "Number of NAs",
         y = "Variables")

# goal1
question1_missing_goal1 <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_goal1 = mean(is.na(goal1)))%>%
  filter(Na_goal1 != 0)

data_question1 <- data_question1 %>% filter(!code %in% question1_missing_goal1$code)
# still 42 NA values goal10

We had found that the missing values were located in only 5 countries. So we have decided to get rid of them. At this stage, there were only 42 remaining missing values. Then, same step for goal 10.

Code
#goal10
question1_missing_goal10 <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_goal10 = mean(is.na(goal10)))%>%
  filter(Na_goal10 != 0)

data_question1 <- data_question1 %>% filter(!code %in% question1_missing_goal10$code)

We have found the 2 lasts countries containing missing values. Now, our dataset is completely clean and ready to be used for our question 1.

2.3.11.2 Data for question 2 and 4

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. Since there are no other missing values, we stop here.

Code
see_missing24 <- data_question24 %>%
  group_by(code) %>%
  summarise(across(everything(), ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))

2.3.11.3 Data for question 3

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. Since there are no other missing values, we stop here.

Disasters

We begin by visualizing the missing values.

Code
variable_names <- names(data_question3_1)
missing_percentages <- sapply(data_question3_1, function(col) mean(is.na(col)) * 100)

missing_data_summary <- data.frame(
  Variable = variable_names,
  Missing_Percentage = missing_percentages
)

missing_data_summary <- missing_data_summary %>%
  mutate(VariableGroup = ifelse(startsWith(Variable, "goal") & Missing_Percentage == 0, "Goals without NAs", as.character(Variable)))

ggplot(data = missing_data_summary, aes(x = reorder(VariableGroup, Missing_Percentage), y = Missing_Percentage, fill = Missing_Percentage)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = ifelse(Missing_Percentage > 1, sprintf("%.1f%%", Missing_Percentage), ""),
                y = Missing_Percentage),
            position = position_stack(vjust = 1),  # Adjust vertical position
            color = "white",  # Text color
            size = 3,          # Text size
            hjust = 1.05) +
  labs(title = "Percentage of Missing Values by Variable",
       x = "Variable",
       y = "Missing Percentage") +
  theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1)) +
  coord_flip()

In this particular case, even if there are many missing values in our disaster dataset, we made the hypothesis that disaster events can not happen every year for every country given that these are uncontrollable and non-recurring events. Therefore the NAs that we encounter will become zeroes, implying that there have been no climatic disasters.

Code
data_question3_1[is.na(data_question3_1)] <- 0

COVID19

We look at the missing values for the three variables that are specific to COVID during the years of COVID: 2020 to 2022. We delete the countries that have NAs (only stringency has 6 countries with 100% NAs).

Code
COVID4 <- data_question3_2 %>%
  filter(year >= 2020 & year <= 2022) %>%
  group_by(code) %>%
  summarize(Na_deaths = mean(is.na(deaths_per_million)),
            Na_cases = mean(is.na(cases_per_million)),
            Na_stringency = mean(is.na(stringency))) %>%
  filter(Na_deaths != 0 | Na_cases!=0 |  Na_stringency !=0)

ggplot(COVID4, aes(x = reorder(code, Na_deaths), y = Na_deaths)) +
  geom_bar(stat = "identity", fill = "lightgreen", color = "black") +
  labs(title = "NAs by rows: deaths per million",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(COVID4, aes(x = reorder(code, Na_cases), y = Na_cases)) +
  geom_bar(stat = "identity", fill = "lightgreen", color = "black") +
  labs(title = "NAs by rows: cases per million",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(COVID4, aes(x = reorder(code, Na_stringency), y = Na_stringency)) +
  geom_bar(stat = "identity", fill = "lightgreen", color = "black") +
  labs(title = "NAs by rows: stringency",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

data_question3_2 <- data_question3_2 %>% filter(!code %in% COVID4$code)

We replace the NAs of the other COVID columns (years 2000 t0 2019) by 0 (because we don’t have real missing, only introduced by merging with the other databases).

Code
all_Merge <- all_Merge %>%
  mutate(
    cases_per_million = ifelse(is.na(cases_per_million), 0, cases_per_million),
    deaths_per_million = ifelse(is.na(deaths_per_million), 0, deaths_per_million),
    stringency = ifelse(is.na(stringency), 0, stringency)
  )

Conflicts

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed.Two countries have missing values, we remove them (MNE and SRB).

Code
see_missing3_3 <- data_question3_3 %>%
  group_by(code) %>%
  summarise(across(-c(goal1, goal10),  # Exclude columns "goal1" and "goal10"
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))

data_question3_3 <- data_question3_3 %>% filter(!code %in% c("MNE","SRB"))

##### EXPORT as CSV #####
write.csv(data_question1, file = here("scripts","data","data_question1.csv"))
write.csv(data_question24, file = here("scripts","data","data_question24.csv"))
write.csv(data_question3_1, file = here("scripts","data","data_question3_1.csv"))
write.csv(data_question3_2, file = here("scripts","data","data_question3_2.csv"))
write.csv(data_question3_3, file = here("scripts","data","data_question3_3.csv"))

3 Exploratory data analysis

3.1 General exploration

We display the distribution of the different SDG achievement scores, using boxplots to have an overview of the median, the range with most of the observations and the outliers.

Code
data_question1 <- read.csv(here("scripts","data","data_question1.csv"))
data_question24 <- read.csv(here("scripts", "data", "data_question24.csv"))
data_question2 <- read.csv(here("scripts", "data", "data_question24.csv"))
data_question3_1 <- read.csv(here("scripts", "data", "data_question3_1.csv"))
data_question3_2 <- read.csv(here("scripts", "data", "data_question3_2.csv"))
data_question3_3 <- read.csv(here("scripts", "data", "data_question3_3.csv"))
Q3.1 <- read.csv(here("scripts", "data", "data_question3_1.csv"))
Q3.2 <- read.csv(here("scripts", "data", "data_question3_2.csv"))
Q3.3 <- read.csv(here("scripts", "data", "data_question3_3.csv"))
data <- read.csv(here("scripts", "data", "all_Merge.csv"))

Correlation_overall <- data_question1 %>% 
      select(population:ef_regulation)

#### boxplots ####

#for goals
#dev.off()
# boxplot(Correlation_overall[2:18], 
#         las = 2,            # Makes the axis labels perpendicular to the axis
#         par(mar = c(5, 4, 4, 2) + 0.1),  # Adjusts the margins to fit all labels
#         cex.axis = 0.7,      # Reduces the size of the axis labels
#         cex.lab = 1,       # Reduces the size of the x and y labels
#         notch = TRUE,       # Specifies whether to add notches or not
#         main = "Merged goals boxplot", # Title of the boxplot
#         xlab = "Goals",  # X-axis label
#         ylab = "Score")     # Y-axis label

#boxplot per continent

data_Q1_Africa <- data_question1 %>%
  filter(data_question1$continent == 'Africa')
data_Q1_Europe <- data_question1 %>%
  filter(data_question1$continent == 'Europe')
data_Q1_Asia <- data_question1 %>%
  filter(data_question1$continent == 'Asia')
data_Q1_Americas <- data_question1 %>%
  filter(data_question1$continent == 'Americas')
data_Q1_Oceania <- data_question1 %>%
  filter(data_question1$continent == 'Oceania')

#Africa
data_Q1_Africa_long <- melt(data_Q1_Africa[,8:24])
medians_AF <- data_Q1_Africa_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))
medians_AF$color <- ifelse(medians_AF$median_value > 75, "lightblue", 
                        ifelse(medians_AF$median_value < 25, "red", 'orange'))
data_Q1_Africa_long <- data_Q1_Africa_long %>%
  left_join(medians_AF, by = "variable")

bandwidth_nrd_AF <- bw.nrd(data_Q1_Africa_long$value)
AF <- ggplot(data_Q1_Africa_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_AF) +  
  facet_grid(variable ~ ., scales = "free_y") +
  scale_fill_identity() +
  labs(title = "Africa SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Europe
data_Q1_Europe_long <- melt(data_Q1_Europe[,8:24])

medians_EU <- data_Q1_Europe_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_EU$color <- ifelse(medians_EU$median_value > 75, "lightblue", 
                        ifelse(medians_EU$median_value < 25, "red", 'orange'))

data_Q1_Europe_long <- data_Q1_Europe_long %>%
  left_join(medians_EU, by = "variable")

bandwidth_nrd_EU <- bw.nrd(data_Q1_Europe_long$value)
EU <- ggplot(data_Q1_Europe_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_EU) +
  scale_fill_identity() +
  labs(title = "European SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Asia
data_Q1_Asia_long <- melt(data_Q1_Asia[,8:24])

medians_AS <- data_Q1_Asia_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_AS$color <- ifelse(medians_AS$median_value > 75, "lightblue", 
                        ifelse(medians_AS$median_value < 25, "red", 'orange'))

data_Q1_Asia_long <- data_Q1_Asia_long %>%
  left_join(medians_AS, by = "variable")

bandwidth_nrd_AS <- bw.nrd(data_Q1_Asia_long$value)
AS <- ggplot(data_Q1_Asia_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_AS) +
  scale_fill_identity() +
  labs(title = "Asian SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Americas
data_Q1_Americas_long <- melt(data_Q1_Americas[,8:24])

medians_AM <- data_Q1_Americas_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_AM$color <- ifelse(medians_AM$median_value > 75, "lightblue", 
                        ifelse(medians_AM$median_value < 25, "red", 'orange'))

data_Q1_Americas_long <- data_Q1_Americas_long %>%
  left_join(medians_AM, by = "variable")

bandwidth_nrd_AM <- bw.nrd(data_Q1_Americas_long$value)
AM <- ggplot(data_Q1_Americas_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_AM) +
  scale_fill_identity() +
  labs(title = "American SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Oceania
data_Q1_Oceania_long <- melt(data_Q1_Oceania[,8:24])

medians_OC <- data_Q1_Oceania_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_OC$color <- ifelse(medians_OC$median_value > 75, "lightblue", 
                        ifelse(medians_OC$median_value < 25, "red", 'orange'))

data_Q1_Oceania_long <- data_Q1_Oceania_long %>%
  left_join(medians_OC, by = "variable")

bandwidth_nrd_OC <- bw.nrd(data_Q1_Oceania_long$value)
OC <- ggplot(data_Q1_Oceania_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_OC) +
  scale_fill_identity() +
  labs(title = "Oceanian SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

grid.arrange(AF,EU,AS,AM,OC, ncol = 2, nrow = 3)

# Correlation_goals <- melt(Correlation_overall[,2:18])
# ggplot(Correlation_goals, aes(x= variable, y= value)) + 
#   geom_violin(trim=FALSE, fill="orange") +
#   labs(title="Merged goals violin boxplot",x="Goals", y = "Distribution") +
#   geom_boxplot(width=0.1, outlier.size = 1) +
#   scale_y_continuous(labels = scales::label_number()) + #limits = c(0, 100)
#   theme_classic() +
#   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

#### WHY GOING BELOW 0 TO > 100 ?? SCORES ONLY FROM 0 TO 100

Code
# Step 1: Combine all data into one data frame with a region identifier

# Melt the data
data_long <- data_question1 %>% 
  select(continent, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal17) %>% 
  melt()

# Calculate medians and colors
medians <- data_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value), .groups = 'drop')
medians$color <- ifelse(medians$median_value > 75, "lightblue", 
                        ifelse(medians$median_value < 25, "red", 'orange'))

# Join the medians back to the long data
data_long <- left_join(data_long, medians, by = "variable")

# Calculate the bandwidth
bandwidth_nrd <- bw.nrd(data_long$value)

# Create the plot
p <- ggplot(data_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd) +  
  #geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_fill_identity() +
  labs(title = "SDG Goals by Region", x = "Goals", y = "Score") +
  facet_grid(continent ~ ., scales = "free_y") +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

# Print the plot
print(p)

We see different schemes among the different goals. Indeed some are quite homogeneous with a small spread of values (e.g. overall score, goals 2 and 8) while others have a large spread of values (e.g. goals 1 and 10). Goals 1, 3, 4, 7, 9, 10 and 13 have values across all possible percentages. Goals 2, 5, 8, 13 and 17 have extreme values situated outside the 95% confidence interval. It is interesting to see that goal 8 (decent work and economic growth) is the one with smaller spread of values, whereas goal 1 (no poverty) have the higher distance between the first and the third quartile. Goal 2 (no hunger) has a tight spread of values, but with the greater amount of outliers in the smaller values, meaning hunger is similar across most countries, but when it differs it is in very bad manner.

We now display boxplpots for the different variables of the human freedom index, and then also for our other independent variables.

Code
#for Human Freedom Index scores 

#Africa 
data_Q1_Africa_HFI_long <- melt(data_Q1_Africa[,29:40])

medians_HFI_AF <- data_Q1_Africa_HFI_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_HFI_AF$color <- ifelse(medians_HFI_AF$median_value > 7.5, "lightblue", 
                        ifelse(medians_HFI_AF$median_value < 2.5, "red", 'orange'))

data_Q1_Africa_HFI_long <- data_Q1_Africa_HFI_long %>%
  left_join(medians_HFI_AF, by = "variable")

bandwidth_nrd_HFI_AF <- bw.nrd(data_Q1_Africa_HFI_long$value)
HFI_AF <- ggplot(data_Q1_Africa_HFI_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_HFI_AF) +
  scale_fill_identity() +
  labs(title = "African HFI Scores boxplot", x = "Human Freedom Index goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
#Europe
data_Q1_Europe_HFI_long <- melt(data_Q1_Europe[,29:40])

medians_HFI_EU <- data_Q1_Europe_HFI_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_HFI_EU$color <- ifelse(medians_HFI_EU$median_value > 7.5, "lightblue", 
                        ifelse(medians_HFI_EU$median_value < 2.5, "red", 'orange'))

data_Q1_Europe_HFI_long <- data_Q1_Europe_HFI_long %>%
  left_join(medians_HFI_EU, by = "variable")

bandwidth_nrd_HFI_EU <- bw.nrd(data_Q1_Europe_HFI_long$value)
HFI_EU <- ggplot(data_Q1_Europe_HFI_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_HFI_EU) +
  scale_fill_identity() +
  labs(title = "Europe HFI Scores boxplot", x = "Human Freedom Index goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
#Asia
data_Q1_Asia_HFI_long <- melt(data_Q1_Asia[,29:40])

medians_HFI_AS <- data_Q1_Asia_HFI_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_HFI_AS$color <- ifelse(medians_HFI_AS$median_value > 7.5, "lightblue", 
                        ifelse(medians_HFI_AS$median_value < 2.5, "red", 'orange'))

data_Q1_Asia_HFI_long <- data_Q1_Asia_HFI_long %>%
  left_join(medians_HFI_AS, by = "variable")

bandwidth_nrd_HFI_AS <- bw.nrd(data_Q1_Asia_HFI_long$value)
HFI_AS <- ggplot(data_Q1_Asia_HFI_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_HFI_AS) +
  scale_fill_identity() +
  labs(title = "Asian HFI Scores boxplot", x = "Human Freedom Index goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
#America
data_Q1_America_HFI_long <- melt(data_Q1_Americas[,29:40])

medians_HFI_AM <- data_Q1_America_HFI_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_HFI_AM$color <- ifelse(medians_HFI_AM$median_value > 7.5, "lightblue", 
                        ifelse(medians_HFI_AM$median_value < 2.5, "red", 'orange'))

data_Q1_America_HFI_long <- data_Q1_America_HFI_long %>%
  left_join(medians_HFI_AM, by = "variable")

bandwidth_nrd_HFI_AM <- bw.nrd(data_Q1_America_HFI_long$value)
HFI_AM <- ggplot(data_Q1_Asia_HFI_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_HFI_AM) +
  scale_fill_identity() +
  labs(title = "America HFI Scores boxplot", x = "Human Freedom Index goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
#Oceania

data_Q1_Oceania_HFI_long <- melt(data_Q1_Oceania[,29:40])

medians_HFI_OC <- data_Q1_Oceania_HFI_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_HFI_OC$color <- ifelse(medians_HFI_OC$median_value > 7.5, "lightblue", 
                        ifelse(medians_HFI_OC$median_value < 2.5, "red", 'orange'))

data_Q1_Oceania_HFI_long <- data_Q1_Oceania_HFI_long %>%
  left_join(medians_HFI_OC, by = "variable")

bandwidth_nrd_HFI_OC <- bw.nrd(data_Q1_Oceania_HFI_long$value)
HFI_OC <- ggplot(data_Q1_Oceania_HFI_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_HFI_OC) +
  scale_fill_identity() +
  labs(title = "Oceanian HFI Scores boxplot", x = "Human Freedom Index goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

grid.arrange(HFI_AF,HFI_EU,HFI_AS,HFI_AM,HFI_OC, ncol = 2, nrow = 3)

# boxplot(Correlation_overall[23:34], 
#         las = 2,            # Makes the axis labels perpendicular to the axis
#         par(mar = c(7, 5, 2, 1)),  # Adjusts the margins to fit all labels
#         cex.axis = 0.7,      # Reduces the size of the axis labels
#         cex.lab = 1,       # Reduces the size of the x and y labels
#         notch = TRUE,       # Specifies whether to add notches or not
#         main = "Merged Human Freedom Index scores boxplot", 
#         ylab = "Score")     # Y-axis label


# Correlation_HFI <- melt(Correlation_overall[,23:34])
# ggplot(Correlation_HFI, aes(x= variable, y= value)) + 
#   geom_violin(trim=FALSE, fill="orange")+
#   labs(title="Merged Human Freedom Index scores violin boxplot",x="Variables", y = "Score")+
#   geom_boxplot(width=0.1, outlier.size = 1)+
#   scale_y_continuous(labels = scales::label_number()) + #limits = c(0, 100)
#   theme_classic() +
#   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

v1 <- ggplot(Correlation_overall, aes(x= factor(1), y= GDPpercapita)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of GDP per capita",x="GDP per capita", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
v2 <- ggplot(Correlation_overall, aes(x= factor(1), y= unemployment.rate)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of unemployment rate",x="Unemployment rate", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
v3 <- ggplot(Correlation_overall, aes(x= factor(1), y= MilitaryExpenditurePercentGDP)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of military expenditure by percentage of GDP",x="Military Expenditure", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
v4 <- ggplot(Correlation_overall, aes(x= factor(1), y= internet_usage)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of internet_usage",x="internet_usage", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
grid.arrange(v1,v2,v3,v4, ncol = 2, nrow = 2)

We now look at the variables in a summary table to have a more precise view of the numbers.

X code year country continent region overallscore goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 goal17
Min. : 1 Length:3818 Min. :2000 Length:3818 Length:3818 Length:3818 Min. :36.0 Min. : 0 Min. : 7.7 Min. : 5.9 Min. : 0.0 Min. : 3.5 Min. :23.3 Min. : 0.1 Min. :38.4 Min. : 0.0 Min. : 0 Min. :13.8 Min. :32.9 Min. : 0.0 Min. :26.0 Min. :27.9 Min. : 15.1
1st Qu.: 955 Class :character 1st Qu.:2005 Class :character Class :character Class :character 1st Qu.:55.1 1st Qu.: 44 1st Qu.:52.3 1st Qu.:44.9 1st Qu.: 55.6 1st Qu.:43.1 1st Qu.:52.6 1st Qu.:41.2 1st Qu.:63.9 1st Qu.:15.4 1st Qu.: 36 1st Qu.:56.4 1st Qu.:67.8 1st Qu.:71.7 1st Qu.:55.1 1st Qu.:51.6 1st Qu.: 45.9
Median :1910 Mode :character Median :2011 Mode :character Mode :character Mode :character Median :65.4 Median : 87 Median :58.9 Median :71.2 Median : 81.2 Median :57.7 Median :64.9 Median :65.2 Median :70.1 Median :29.0 Median : 63 Median :75.3 Median :84.5 Median :90.8 Median :65.1 Median :61.7 Median : 55.1
Mean :1910 NA Mean :2011 NA NA NA Mean :63.9 Mean : 71 Mean :57.7 Mean :64.3 Mean : 72.1 Mean :56.0 Mean :64.5 Mean :57.6 Mean :69.8 Mean :37.0 Mean : 59 Mean :70.2 Mean :79.3 Mean :81.5 Mean :65.1 Mean :62.6 Mean : 55.6
3rd Qu.:2864 NA 3rd Qu.:2017 NA NA NA 3rd Qu.:72.3 3rd Qu.: 99 3rd Qu.:65.3 3rd Qu.:81.7 3rd Qu.: 94.9 3rd Qu.:69.2 3rd Qu.:74.7 3rd Qu.:72.1 3rd Qu.:76.7 3rd Qu.:52.7 3rd Qu.: 82 3rd Qu.:85.1 3rd Qu.:94.1 3rd Qu.:97.3 3rd Qu.:74.3 3rd Qu.:74.2 3rd Qu.: 65.0
Max. :3818 NA Max. :2022 NA NA NA Max. :86.8 Max. :100 Max. :83.4 Max. :97.3 Max. :100.0 Max. :94.0 Max. :95.1 Max. :99.6 Max. :93.6 Max. :99.2 Max. :100 Max. :99.9 Max. :99.0 Max. :99.9 Max. :97.9 Max. :96.0 Max. :100.0
NA NA NA NA NA NA NA NA's :345 NA NA NA NA NA NA NA NA NA's :391 NA NA NA NA NA NA

3.2 Focus on the influence of the factors over the SDG scores

Using our cleaned dataset, we first want to observe how each of our variables are correlated with the others. For that, we will use a heatmap. Given that most of our variables are not normally distributed, we will use the Spearman method to calculate the correlation.

Code
#### Correlations between variables Heatmap ####

Correlation_overall <-data_question1 %>% # selection of the numerical data
      select(population:ef_regulation)

cor_matrix_sper <- # calculation of the correlation matrix
  cor(Correlation_overall, method = "spearman", use = "everything")

cor_melted <- # wide to long transformation
  melt(cor_matrix_sper)

ggplot(data = cor_melted, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab", 
                       name="Spearman\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '', title = 'Correlation Matrix Heatmap')

#do 3 different heatmaps : goals on goals, goals on other variables except goals, variables on variables (except goals)

In the correlation matrix heatmap, we observe that goals 1 to 11 are predominantly positively correlated. Conversely, goals 12 and 13 exhibit negative correlations with most variables, except between themselves where they are strongly correlated. Additionally, there is a notable strong correlation among personal freedom variables (pf), reflecting scores from the Human Freedom Index on movement, religion, assembly, and expression.

In order to have an overview of the relationship between our independent variables and the SDG overall score, we make several graphs containing the Spearman correlation coefficient between the variable, the scatter plots describing the relationship between the variables, as well as the distribution of each variable.

Code
#### Spearman's correlation coeff ####

panel.hist <- function(x, ...){ 
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(usr[1:2], 0, 1.5) ) 
  h <- hist(x, plot = FALSE) 
  breaks <- h$breaks; nB <- length(breaks) 
  y <- h$counts; y <- y/max(y) 
  rect(breaks[-nB], 0, breaks[-1], y, col = "lightgreen", ...)
}
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...){ 
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- cor(x, y, method = "spearman") 
  txt <- format(c(r, 0.123456789), digits = digits)[1] 
  txt <- paste0(prefix, txt) 
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt) 
  text(0.5, 0.5, txt, cex = cex.cor * r)
}

# # Independent variables 
pairs(Correlation_overall[,c("overallscore", "unemployment.rate", "GDPpercapita", "MilitaryExpenditurePercentGDP", "internet_usage")], upper.panel=panel.cor, diag.panel=panel.hist, main="Correlation table and distribution of various variables")

The overall SDG achievement score is highly correlated with the percentage of people using the internet (r=.79) and GDP per capita (r=.60). The unemployement rate as well as the military expenditures in percentage of GDP per capita do not seem to play a role. However, this is only for the overall score.

The overall SDG achievement score is highly correlated with “personal freedom: law” (p=.69) and “personal freedom: identity” (p=.62). The other dimensions of personal freedom do not seem to have important influence. Regarding the distribution of the personal freedom variables, we notice that except for law, all have right-skewed distributions meaning that most of the countries have high scores.

The overall SDG achievement score is highly correlated with “economical freedom: legal” (p=.77), “economical trade: legal” (p=.67) and “economical freedom: money” (p=.6), while the other dimensions of economic freedom do not seem to have important influence. Regarding the distribution of the economic freedom variables, we notice more heterogeneous distributions and scores across the various countries than for personal freedom.

3.2.1 Looking at SDGs

ADD GREEN GRAPH HERE TO HAVE ALSO THE DISTRIBUTION

As we can see in the graph, most of our goals are correlated toghether. We will nown perform a PCA analysis to see how our variables are explained. You can see below the Scree plot of our PCA analysis.

Code
#### PCA and PCA Scree plot####

myPCA_g <- PCA(data_question1[,9:24], graph = FALSE)
fviz_eig(myPCA_g,
         addlabels = TRUE) +
  theme_minimal()

As we can see, Dimension 1 explain already more than 60% of the variations in our data. With Dimension 2, it goes up to around 70%. We can now plot our data with our two firsts dimensions.

Code
#### PCA Biplot ####

fviz_pca_biplot(myPCA_g,
                label="var",
                col.var="dodgerblue3",
                geom="point",
                pointsize = 0.1,
                labelsize = 5) +
  theme_minimal()

Concerning the SDG goals, we conclude that most of our variables are going along the 1st component, except the goals 10 and 15 that are rather uncorrelated with the dimension 1. In addition, as seen before, the goals 12 and 13 are negatively correlated to the other goals. With a eigenvalue bigger than 1 for the first two components, we conclude that there are only 2 dimensions to take into account, according to the Kaiser-Guttman’s rule. Nevertheless, they are explaining less than 80% of cumulated variance.

Code
#### Forward selection ####

library(MASS)
clean_data <- na.omit(data_question24)
# Initialize variables to store the results
step_results <- data.frame(step = integer(), aic = numeric(), adjusted_r_squared = numeric())

# Initial model (null model)
current_model <- lm(overallscore ~ 1, data = clean_data)

# Record initial metrics
step_results <- rbind(step_results, data.frame(step = 0, aic = AIC(current_model), adjusted_r_squared = summary(current_model)$adj.r.squared))

# Perform forward selection
for (variable in colnames(clean_data)[grepl("goal", colnames(clean_data))]) {
    current_model <- update(current_model, paste(". ~ . +", variable))
    current_step <- nrow(step_results) + 1
    step_results <- rbind(step_results, data.frame(step = current_step, aic = AIC(current_model), adjusted_r_squared = summary(current_model)$adj.r.squared))
}

ggplot(step_results, aes(x = step)) +
    geom_line(aes(y = aic, color = "AIC")) +
    geom_line(aes(y = adjusted_r_squared * 100, color = "Adjusted R-squared")) +
    labs(title = "Forward Selection Process", x = "Step", y = "Metric Value") +
    scale_color_manual("", breaks = c("AIC", "Adjusted R-squared"), values = c("blue", "red"))

3.2.2 Looking at the HFI scores

Code
#### PCA and PCA Scree plot####

myPCA_s <- PCA(data_question1[,29:40], graph = FALSE)
fviz_eig(myPCA_s,
         addlabels = TRUE) +
  theme_minimal()

Code
#### PCA Biplot ####
fviz_pca_biplot(myPCA_s,
                label="var",
                col.var="dodgerblue3",
                geom="point",
                pointsize = 0.1,
                labelsize = 5) +
  theme_minimal()

Now concerning the Human Freedom Index scores, most of the variables are positively correlated to the dimension 1, slightly less for the PF religion and security, and finaly the EF government variable is uncorrelated to the dimension 1. With a eigenvalue bigger than 1 for the three first components, we conclude that there are 3 dimensions to take into account. Nevertheless, again, they are explaining less than 80% of cumulated variance.

Code
#### Kmean clustering ####

data1_scaled <- scale(Correlation_overall)
rownames(data1_scaled) <- seq_along(row.names(data1_scaled))
fviz_nbclust(data1_scaled, kmeans, method="wss")
kmean <- kmeans(data1_scaled, 7, nstart = 25)
fviz_cluster(kmean, data=data1_scaled, repel=FALSE, depth =NULL, ellipse.type = "norm", labelsize = 0, pointsize = 0.5)

### NOW CLUSTERING BY COUNTRY? AND TAKE MEAN OF EVERY VARIABLE ON EVERY CONCERNED YEAR?

Due to the large number of data, the visualization of the clusters using the kmean method is not really relevant. In addition, by clustering our data, we are trying to get group that differ from eachother but with little variation of the observations within the same cluster. Here, only 60.6% of the variance is explained by the variation between clusters. This is not enough.

3.3 Focus on the influence of events over the SDG scores

In order to have an overview of the relationship between the different events variables and the SDG overall score, we make several graphs containing the Pearson correlation coefficient between the variable, the scatter plots describing the relationship between the variables, as well as the distribution of each variable.

Code
lower.panel <- function(x, y, ...){
  points(x, y, pch = 20, col = "black", cex = 0.2)
}
evaluateCorrelationStars <- function(correlation) {
  if (abs(correlation) >= 0.7) {
    return("*****")  # Strong correlation: 5 stars
  } else if (abs(correlation) >= 0.5) {
    return("****")   # Moderate correlation: 4 stars
  } else if (abs(correlation) >= 0.3) {
    return("***")    # Fair correlation: 3 stars
  } else if (abs(correlation) >= 0.1) {
    return("**")     # Weak correlation: 2 stars
  } else {
    return("*")      # Very weak correlation: 1 star
  }
}

# panel.cor function with stars alongside correlation coefficients
panel.cor_stars <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- cor(x, y)
  stars <- evaluateCorrelationStars(r)
  txt <- paste0(format(c(r, 0.123456789), digits = digits)[1], " ", stars)
  if(missing(cex.cor)) cex.cor <- 0.5/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor)
}

pairs(data_question3_1[, c("overallscore", "total_affected", "total_deaths")], upper.panel = panel.cor_stars,diag.panel = panel.hist,lower.panel = lower.panel, main = "Correlation table and distribution of Disaster variables")

The different variables used to materialize the impact of climate disasters do not seem to have important influence on the overall score, but we will further explore for the different SDGs, since we believe that such disasters have a specific influence on some SDGs.

Code
lower.panel <- function(x, y, ...){
  points(x, y, pch = 20, col = "black", cex = 0.2)
}
evaluateCorrelationStars <- function(correlation) {
  if (abs(correlation) >= 0.7) {
    return("*****")  
  } else if (abs(correlation) >= 0.5) {
    return("****")
  } else if (abs(correlation) >= 0.3) {
    return("***") 
  } else if (abs(correlation) >= 0.1) {
    return("**")
  } else {
    return("*")
  }
}

# panel.cor function with stars alongside correlation coefficients
panel.cor_stars <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- cor(x, y)
  stars <- evaluateCorrelationStars(r)
  txt <- paste0(format(c(r, 0.123456789), digits = digits)[1], " ", stars)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor)
}

#ERROR TO RESOLVE (PROBABLY NAs SOMEWHERE)

# pairs(data_question3_2[,c("overallscore", "cases_per_million", "deaths_per_million", "stringency")], upper.panel = panel.cor_stars, diag.panel=panel.hist, lower.panel = lower.panel,main="Correlation table and distribution of COVID variables")

The different variables used to materialize the impact of COVID19 do not seem to have important influence on the overall score, but we will further explore for the different SDGs, since we believe that COVID19 had a specific influence on some SDGs, for instance “good health and well-being” or “decent work and economic growth”.

Code
lower.panel <- function(x, y, ...){
  points(x, y, pch = 20, col = "black", cex = 0.5)
}
evaluateCorrelationStars <- function(correlation) {
  if (abs(correlation) >= 0.7) {
    return("*****") 
  } else if (abs(correlation) >= 0.5) {
    return("****")
  } else if (abs(correlation) >= 0.3) {
    return("***") 
  } else if (abs(correlation) >= 0.1) {
    return("**")
  } else {
    return("*")
  }
}

# panel.cor function with stars alongside correlation coefficients
panel.cor_stars <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- cor(x, y)
  stars <- evaluateCorrelationStars(r)
  txt <- paste0(format(c(r, 0.123456789), digits = digits)[1], " ", stars)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor)
}

# ALSO ERROR

# pairs(data_question3_3[,c("overallscore", "ongoing", "sum_deaths", "pop_affected", "area_affected", "maxintensity")], upper.panel = panel.cor_stars, diag.panel=panel.hist, lower.panel = lower.panel, main="Correlation table and distribution of conflicts variables")

The different variables used to materialize the impact of conflicts do not seem to have important influence on the overall score, but we will further explore for the different SDGs, since we believe that conflicts have a specific influence on some SDGs.

To explore our data on events such as disasters, covid-19 and conflicts we have to first see which countries are the most touched by these. To do so, we made time-series analysis on this three events each time depending on different variables.

Code
# Converted 'year' column to date format
Q3.1$year <- as.Date(as.character(Q3.1$year), format = "%Y")
Q3.2$year <- as.Date(as.character(Q3.2$year), format = "%Y")
Q3.3$year <- as.Date(as.character(Q3.3$year), format = "%Y")

These is our time-analysis concerning the COVID-19 cases per million by region between end 2018 and 2022.

Code
library(ggplot2)
covid_filtered <- Q3.2[Q3.2$year >= as.Date("2018-12-12"), ]

ggplot(data = covid_filtered, aes(x = year, y = cases_per_million, group = region, color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.8, size = 0.5) + 
  labs(title = "Trend of COVID-19 Cases per Million Over Time",
       x = "Year", y = "Cases per Million") +
  facet_wrap(~ region, ncol = 3) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(nrow = 3))

These is our time-analysis concerning the COVID-19 deaths per million per region between end 2018 and 2022

Code
ggplot(data = covid_filtered, aes(x = year, y = deaths_per_million, group = region, color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.8, size = 0.5) + 
  labs(title = "Trend of COVID-19 Deaths per Million Over Time", x = "Year", y = "Deaths per Million") +
  facet_wrap(~ region, nrow = 3) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(nlin = 3))

These is our time-analysis concerning the COVID-19 stringency per region between end 2018 and 2022

Code
ggplot(data = covid_filtered, aes(x = year, y = stringency, group = region, 
                                  color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.7, size = 0.5) + 
  labs(title = "Trend of COVID-19 Stringency Over Time", x = "Year", y = "Stringency") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

These is our time-analysis concerning climatic disasters with total affected per region

Code
Q3.1[is.na(Q3.1)] <- 0
ggplot(data = Q3.1, aes(x = year, y = total_affected, group = region, color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.7, size = 0.5) + 
  labs(title = "Trend of Total Affected from Climatic Disasters Over Time", x = "Year", y = "Total Affected") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

These is our time-analysis concerning conflicts deaths per region between 2000 and 2016

Code
conflicts_filtered <- Q3.3[Q3.3$year >= as.Date("2000-01-01") & Q3.3$year <= as.Date("2016-12-31"), ]

ggplot(data = conflicts_filtered, aes(x = year, y = sum_deaths, group = region, color = region)) +
  geom_smooth(method = "loess", se = FALSE, span = 0.3, size = 0.5) +  # Using loess smoothing method
  labs(title = "Trend of Deaths by Conflicts Over Time", x = "Year", y = "Sum Deaths") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

We can see that the regions’ the most affected by the conflicts are : Middle east and north Africa, Sub-Saharan Africa, South Asia, then less America & the Caribbean and Eastern Europe

These is our time-analysis concerning conflicts affected population per region between 2000 and 2016

Code
ggplot(data = conflicts_filtered, aes(x = year, y = pop_affected, group = region, color = region)) +
  geom_smooth(method = "loess", se = FALSE, span = 0.3, size = 0.5) +  # Using loess smoothing method
  labs(title = "Trend of Population Affected by Conflicts Over Time", x = "Year", y = "pop_affected") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

We can see that the regions’ the most affected by the conflicts are : Middle east and north Africa, Sub-Saharan Africa, South Asia, America & the Caribbean, Eastern Europe and sometimes Caucasus and Central Asia

Now that we could visualize which regions are the most impacted by these three events we can do correlations analysis per region to see if this events have indeed an impact on the evolution of SDG goals.

Here we want to analyse the correlation between the climate disasters and the SDG goals in South and East Asia.

Code
Q3.1[is.na(Q3.1)] <- 0

south_east_asia_data <- Q3.1[Q3.1$region %in% c("South Asia", "East Asia"), ]

relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "total_affected", "no_homeless")

correlation_matrix_disaster_Asia <- cor(south_east_asia_data[, relevant_columns], use = "complete.obs")

kable(correlation_matrix_disaster_Asia)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 total_affected no_homeless
goal1 1.000 0.042 0.168 0.241 0.141 0.360 0.259 0.509 0.110 0.658 -0.047 0.198 0.321 0.250 0.061 -0.011 -0.025
goal2 0.042 1.000 0.589 0.457 0.560 0.552 0.468 0.550 0.644 -0.126 0.414 -0.192 -0.089 -0.305 0.449 0.105 -0.069
goal3 0.168 0.589 1.000 0.798 0.583 0.657 0.831 0.762 0.862 -0.248 0.810 -0.749 -0.624 -0.129 0.718 -0.029 -0.112
goal4 0.241 0.457 0.798 1.000 0.599 0.463 0.645 0.578 0.658 -0.161 0.493 -0.600 -0.533 -0.035 0.386 0.079 -0.023
goal5 0.141 0.560 0.583 0.599 1.000 0.554 0.449 0.445 0.433 -0.111 0.503 -0.376 -0.321 -0.182 0.348 0.052 -0.153
goal6 0.360 0.552 0.657 0.463 0.554 1.000 0.633 0.628 0.663 0.033 0.560 -0.457 -0.261 -0.156 0.565 -0.122 -0.202
goal7 0.259 0.468 0.831 0.645 0.449 0.633 1.000 0.672 0.756 -0.157 0.804 -0.555 -0.446 -0.139 0.555 -0.034 -0.067
goal8 0.509 0.550 0.762 0.578 0.445 0.628 0.672 1.000 0.711 0.202 0.557 -0.466 -0.288 -0.053 0.626 -0.008 -0.086
goal9 0.110 0.644 0.862 0.658 0.433 0.663 0.756 0.711 1.000 -0.161 0.660 -0.697 -0.563 -0.166 0.655 0.003 -0.072
goal10 0.658 -0.126 -0.248 -0.161 -0.111 0.033 -0.157 0.202 -0.161 1.000 -0.422 0.347 0.414 0.410 -0.074 -0.114 -0.017
goal11 -0.047 0.414 0.810 0.493 0.503 0.560 0.804 0.557 0.660 -0.422 1.000 -0.700 -0.629 -0.196 0.675 -0.134 -0.158
goal12 0.198 -0.192 -0.749 -0.600 -0.376 -0.457 -0.555 -0.466 -0.697 0.347 -0.700 1.000 0.899 -0.023 -0.687 0.124 0.122
goal13 0.321 -0.089 -0.624 -0.533 -0.321 -0.261 -0.446 -0.288 -0.563 0.414 -0.629 0.899 1.000 -0.182 -0.495 0.075 0.093
goal15 0.250 -0.305 -0.129 -0.035 -0.182 -0.156 -0.139 -0.053 -0.166 0.410 -0.196 -0.023 -0.182 1.000 0.051 -0.083 -0.034
goal16 0.061 0.449 0.718 0.386 0.348 0.565 0.555 0.626 0.655 -0.074 0.675 -0.687 -0.495 0.051 1.000 -0.150 -0.133
total_affected -0.011 0.105 -0.029 0.079 0.052 -0.122 -0.034 -0.008 0.003 -0.114 -0.134 0.124 0.075 -0.083 -0.150 1.000 0.149
no_homeless -0.025 -0.069 -0.112 -0.023 -0.153 -0.202 -0.067 -0.086 -0.072 -0.017 -0.158 0.122 0.093 -0.034 -0.133 0.149 1.000
Code

cor_melted <- as.data.frame(as.table(correlation_matrix_disaster_Asia))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between the climate disasters and the SDG goals in South and East Asia')

We conclude that climate disasters do not really have a big impact on SDG goals.

Here we want to analyse the correlation between the Covid-19 and the SDG goals only during Covid time.

Code
covid_filtered <- Q3.2[Q3.2$year >= as.Date("2019-01-01"), ]

relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "stringency", "cases_per_million", "deaths_per_million")
# Subset data with relevant columns for correlation analysis
relevant_data <- covid_filtered[, relevant_columns]

correlation_matrix_Covid <- cor(relevant_data, use = "complete.obs")

kable(correlation_matrix_Covid)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 stringency cases_per_million deaths_per_million
goal1 1.000 0.567 0.868 0.788 0.439 0.759 0.798 0.612 0.793 0.507 0.735 -0.649 -0.564 0.107 0.713 0.124 0.407 0.442
goal2 0.567 1.000 0.585 0.579 0.450 0.597 0.502 0.647 0.581 0.267 0.507 -0.344 -0.281 0.101 0.472 0.105 0.249 0.314
goal3 0.868 0.585 1.000 0.840 0.642 0.821 0.849 0.716 0.887 0.471 0.835 -0.786 -0.673 0.158 0.826 0.082 0.492 0.465
goal4 0.788 0.579 0.840 1.000 0.636 0.750 0.814 0.609 0.775 0.328 0.775 -0.640 -0.553 0.069 0.686 0.161 0.409 0.429
goal5 0.439 0.450 0.642 0.636 1.000 0.646 0.603 0.554 0.629 0.100 0.678 -0.641 -0.556 0.214 0.635 0.026 0.381 0.317
goal6 0.759 0.597 0.821 0.750 0.646 1.000 0.761 0.689 0.798 0.366 0.748 -0.701 -0.579 0.251 0.731 0.092 0.461 0.502
goal7 0.798 0.502 0.849 0.814 0.603 0.761 1.000 0.576 0.748 0.330 0.803 -0.649 -0.502 0.124 0.700 0.106 0.403 0.458
goal8 0.612 0.647 0.716 0.609 0.554 0.689 0.576 1.000 0.708 0.391 0.606 -0.639 -0.546 0.278 0.655 -0.019 0.437 0.374
goal9 0.793 0.581 0.887 0.775 0.629 0.798 0.748 0.708 1.000 0.473 0.757 -0.847 -0.756 0.186 0.829 0.066 0.541 0.439
goal10 0.507 0.267 0.471 0.328 0.100 0.366 0.330 0.391 0.473 1.000 0.311 -0.508 -0.485 0.233 0.518 -0.057 0.310 0.154
goal11 0.735 0.507 0.835 0.775 0.678 0.748 0.803 0.606 0.757 0.311 1.000 -0.684 -0.571 0.093 0.767 0.061 0.408 0.415
goal12 -0.649 -0.344 -0.786 -0.640 -0.641 -0.701 -0.649 -0.639 -0.847 -0.508 -0.684 1.000 0.874 -0.333 -0.829 0.029 -0.563 -0.377
goal13 -0.564 -0.281 -0.673 -0.553 -0.556 -0.579 -0.502 -0.546 -0.756 -0.485 -0.571 0.874 1.000 -0.212 -0.695 -0.006 -0.447 -0.233
goal15 0.107 0.101 0.158 0.069 0.214 0.251 0.124 0.278 0.186 0.233 0.093 -0.333 -0.212 1.000 0.307 -0.159 0.202 0.276
goal16 0.713 0.472 0.826 0.686 0.635 0.731 0.700 0.655 0.829 0.518 0.767 -0.829 -0.695 0.307 1.000 0.021 0.514 0.390
stringency 0.124 0.105 0.082 0.161 0.026 0.092 0.106 -0.019 0.066 -0.057 0.061 0.029 -0.006 -0.159 0.021 1.000 -0.275 0.215
cases_per_million 0.407 0.249 0.492 0.409 0.381 0.461 0.403 0.437 0.541 0.310 0.408 -0.563 -0.447 0.202 0.514 -0.275 1.000 0.358
deaths_per_million 0.442 0.314 0.465 0.429 0.317 0.502 0.458 0.374 0.439 0.154 0.415 -0.377 -0.233 0.276 0.390 0.215 0.358 1.000
Code

cor_melted <- as.data.frame(as.table(correlation_matrix_Covid))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between COVID and the SDG goals')

Same conclusion, really weird.

Here we want to analyse the correlation between conflicts deaths and the SDG goals only for the Middle East & North Africa, Sub-Saharan Africa, South Asia, Latin America & the Caribbean and Eastern Europe regions.

Code

# Filter data for specific regions
selected_regions <- c("Middle East & North Africa", "Sub-Saharan Africa", "South Asia", "Latin America & the Caribbean", "Eastern Europe")
conflicts_selected <- Q3.3[Q3.3$region %in% selected_regions, ]

# Select relevant columns for the correlation analysis
relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "sum_deaths")

# Compute correlation matrix for the selected regions
correlation_matrix_Conflicts_Deaths <- cor(conflicts_selected[, relevant_columns], use = "complete.obs")

# View the correlation matrix
kable(correlation_matrix_Conflicts_Deaths)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 sum_deaths
goal1 1.000 0.476 0.910 0.791 0.406 0.799 0.865 0.546 0.723 0.272 0.783 -0.730 -0.594 0.039 0.613 -0.095
goal2 0.476 1.000 0.544 0.531 0.540 0.638 0.531 0.571 0.530 0.102 0.475 -0.376 -0.322 0.154 0.430 -0.173
goal3 0.910 0.544 1.000 0.814 0.507 0.832 0.876 0.596 0.768 0.223 0.828 -0.745 -0.587 0.014 0.666 -0.117
goal4 0.791 0.531 0.814 1.000 0.645 0.748 0.808 0.536 0.696 0.089 0.768 -0.667 -0.533 0.007 0.496 -0.101
goal5 0.406 0.540 0.507 0.645 1.000 0.587 0.539 0.454 0.516 -0.178 0.620 -0.464 -0.351 0.191 0.384 -0.162
goal6 0.799 0.638 0.832 0.748 0.587 1.000 0.812 0.670 0.734 0.137 0.788 -0.711 -0.529 0.187 0.599 -0.166
goal7 0.865 0.531 0.876 0.808 0.539 0.812 1.000 0.539 0.720 0.152 0.841 -0.704 -0.531 0.039 0.566 -0.094
goal8 0.546 0.571 0.596 0.536 0.454 0.670 0.539 1.000 0.609 0.209 0.542 -0.519 -0.389 0.181 0.462 -0.102
goal9 0.723 0.530 0.768 0.696 0.516 0.734 0.720 0.609 1.000 0.300 0.698 -0.759 -0.689 0.137 0.591 -0.077
goal10 0.272 0.102 0.223 0.089 -0.178 0.137 0.152 0.209 0.300 1.000 0.035 -0.297 -0.299 0.118 0.275 0.078
goal11 0.783 0.475 0.828 0.768 0.620 0.788 0.841 0.542 0.698 0.035 1.000 -0.729 -0.570 0.031 0.656 -0.155
goal12 -0.730 -0.376 -0.745 -0.667 -0.464 -0.711 -0.704 -0.519 -0.759 -0.297 -0.729 1.000 0.865 -0.170 -0.666 0.122
goal13 -0.594 -0.322 -0.587 -0.533 -0.351 -0.529 -0.531 -0.389 -0.689 -0.299 -0.570 0.865 1.000 -0.150 -0.493 0.079
goal15 0.039 0.154 0.014 0.007 0.191 0.187 0.039 0.181 0.137 0.118 0.031 -0.170 -0.150 1.000 0.191 -0.063
goal16 0.613 0.430 0.666 0.496 0.384 0.599 0.566 0.462 0.591 0.275 0.656 -0.666 -0.493 0.191 1.000 -0.162
sum_deaths -0.095 -0.173 -0.117 -0.101 -0.162 -0.166 -0.094 -0.102 -0.077 0.078 -0.155 0.122 0.079 -0.063 -0.162 1.000
Code

# Melt the correlation matrix for ggplot2
cor_melted <- as.data.frame(as.table(correlation_matrix_Conflicts_Deaths))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

# Create the heatmap
ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between Conflicts deaths and the SDG goals')

Finally, we want to analyse the correlation between conflicts affected population and the SDG goals only for the Middle East & North Africa, Sub-Saharan Africa, South Asia, Latin America & the Caribbean, Eastern Europe regions and Caucasus and Central Asia.

Code

# Filter data for specific regions (pop_affected)
selected_regions <- c("Middle East & North Africa", "Sub-Saharan Africa", "South Asia", "Latin America & the Caribbean", "Eastern Europe","Caucasus and Central Asia")
conflicts_selected <- Q3.3[Q3.3$region %in% selected_regions, ]

# Select relevant columns for the correlation analysis
relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "pop_affected")

# Compute correlation matrix for the selected regions
correlation_matrix_Conflicts_Pop_Affected <- cor(conflicts_selected[, relevant_columns], use = "complete.obs")

# View the correlation matrix
kable(correlation_matrix_Conflicts_Pop_Affected)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 pop_affected
goal1 1.000 0.476 0.910 0.791 0.406 0.799 0.865 0.546 0.723 0.272 0.783 -0.730 -0.594 0.039 0.613 -0.066
goal2 0.476 1.000 0.544 0.531 0.540 0.638 0.531 0.571 0.530 0.102 0.475 -0.376 -0.322 0.154 0.430 -0.083
goal3 0.910 0.544 1.000 0.814 0.507 0.832 0.876 0.596 0.768 0.223 0.828 -0.745 -0.587 0.014 0.666 -0.058
goal4 0.791 0.531 0.814 1.000 0.645 0.748 0.808 0.536 0.696 0.089 0.768 -0.667 -0.533 0.007 0.496 -0.030
goal5 0.406 0.540 0.507 0.645 1.000 0.587 0.539 0.454 0.516 -0.178 0.620 -0.464 -0.351 0.191 0.384 -0.152
goal6 0.799 0.638 0.832 0.748 0.587 1.000 0.812 0.670 0.734 0.137 0.788 -0.711 -0.529 0.187 0.599 -0.106
goal7 0.865 0.531 0.876 0.808 0.539 0.812 1.000 0.539 0.720 0.152 0.841 -0.704 -0.531 0.039 0.566 -0.071
goal8 0.546 0.571 0.596 0.536 0.454 0.670 0.539 1.000 0.609 0.209 0.542 -0.519 -0.389 0.181 0.462 -0.099
goal9 0.723 0.530 0.768 0.696 0.516 0.734 0.720 0.609 1.000 0.300 0.698 -0.759 -0.689 0.137 0.591 0.000
goal10 0.272 0.102 0.223 0.089 -0.178 0.137 0.152 0.209 0.300 1.000 0.035 -0.297 -0.299 0.118 0.275 0.074
goal11 0.783 0.475 0.828 0.768 0.620 0.788 0.841 0.542 0.698 0.035 1.000 -0.729 -0.570 0.031 0.656 -0.103
goal12 -0.730 -0.376 -0.745 -0.667 -0.464 -0.711 -0.704 -0.519 -0.759 -0.297 -0.729 1.000 0.865 -0.170 -0.666 0.107
goal13 -0.594 -0.322 -0.587 -0.533 -0.351 -0.529 -0.531 -0.389 -0.689 -0.299 -0.570 0.865 1.000 -0.150 -0.493 0.021
goal15 0.039 0.154 0.014 0.007 0.191 0.187 0.039 0.181 0.137 0.118 0.031 -0.170 -0.150 1.000 0.191 -0.108
goal16 0.613 0.430 0.666 0.496 0.384 0.599 0.566 0.462 0.591 0.275 0.656 -0.666 -0.493 0.191 1.000 -0.099
pop_affected -0.066 -0.083 -0.058 -0.030 -0.152 -0.106 -0.071 -0.099 0.000 0.074 -0.103 0.107 0.021 -0.108 -0.099 1.000
Code

# Melt the correlation matrix for ggplot2
cor_melted <- as.data.frame(as.table(correlation_matrix_Conflicts_Pop_Affected))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

# Create the heatmap
ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between Conflicts Affected Population and the SDG goals')

4 Focus on the evolution of SDG scores over time

How has the adoption of the SDGs in 2015 influenced the achievement of SDGs?

Code
data_question2 <- read.csv(here("scripts", "data", "data_question24.csv"))
data_question2 <- data_question2 %>% select(-X)

4.1 EDA: General time evolution of SDG socres

First, we look at the evolution of SDG achievement overall score over time by continent and by region and we see that the general evolution of SDG scores around the world is increasing over the years, but very slowly.

Code
data2 <- data_question2 %>% group_by(year, continent) %>%
  mutate(mean_overall_score_by_year=mean(overallscore))

ggplot(data2) +
  geom_line(mapping=aes(x=year, y=mean_overall_score_by_year, color=continent), lwd=0.8) +
  geom_point(mapping=aes(x=year, y=mean_overall_score_by_year, color=continent), lwd=1.5) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean overall SDG achievement score",
       y = "Mean Overall SDG Score",
       x = "Year"
       )

Looking at the continents, we see that Europe is above the others, while Africa is below, but in general, all have increasing overall scores.

Code
data3 <- data_question2 %>% group_by(year, region) %>%
  mutate(mean_overall_score_by_year=mean(overallscore))

ggplot(data3) +
  geom_line(mapping=aes(x=year, y=mean_overall_score_by_year, color=region), lwd=0.8) +
  geom_point(mapping=aes(x=year, y=mean_overall_score_by_year, color=region), lwd=1.5) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean overall SDG achievement score",
       y = "Mean Overall SDG Score",
       x = "Year"
       )+
  theme(legend.position = "bottom")

This view that groups the countries by region gives us precision about the previous information. Indeed, it is Western Europe that is particularly above and Sub-Saharan Africa that is clearly below.

Second, we look at the evolution of SDG achievement scores(16) over time for the whole world and by continent. We notice that all SDGs except from goal 9 (industry innovation and infrastructure) are close to one another in terms of level and growth. Goal 9 starts far below the others in 2000 and growths faster until exceeding 50%. In addition, some goals did not increase their scores much in the last two decades, for example goal 13 (climate action) and goal 12 (responsible consumption and production).

Code
data4 <- data_question2 %>%
  group_by(year) %>%
  summarise(across(starts_with("goal"), mean, na.rm=TRUE)) %>%
  pivot_longer(cols = starts_with("goal"), names_to = "goal", values_to = "mean_value")

color_palette <- c("red", "blue", "green", "orange", "purple", "pink", "lightblue", "gray", "cyan", "magenta", "yellow", "darkgreen", "darkblue", "darkred", "darkgrey", "darkcyan")

ggplot(data = data4) +
  geom_line(mapping = aes(x = year, y = mean_value, color = goal), size = 0.7) +
  geom_point(mapping = aes(x = year, y = mean_value, color = goal), size = 1) +
  scale_color_manual(values = color_palette) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean SDG achievement scores across the world",
       y = "Mean SDG Scores",
       x = "Year"
       ) 

We continue with the graph that distinguishes continents to get more information.

Code
data5 <- data_question2 %>%
  group_by(year, continent) %>%
  summarise(across(starts_with("goal"), mean, na.rm=TRUE)) %>%
  pivot_longer(cols = starts_with("goal"), names_to = "goal", values_to = "mean_value")

ggplot(data = data5) +
  geom_line(mapping = aes(x = year, y = mean_value, color=continent), size = 0.7) +
  scale_color_manual(values = color_palette) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean SDG achievement scores by continent",
       y = "Mean SDG Scores",
       x = "Years from 2000 to 2022"
       ) +
  facet_wrap(~ goal, nrow = 4)+
  scale_x_continuous(breaks = NULL)+
  theme_light()

We observe that most of the time, Europe is at the top of the graph and Africa at the bottom, except for goals 12 and 13 that are linked to ecology. Some other information stand out:

  • Americas are far behind the other parts of the world regarding goal 10: reduced inequalities.

  • Africa is far behind the other continents (even if becoming better) for goals 1, 3, 4 and 7.

  • Goal 9 (industry, innovation and infrastructure) show exponential growth for almost all continents.

Third we create an interactive map of the world to be able to navigate from year 2000 to 2022, seeing the level of achievement of the SDGs (overall score) for each country.

Code
# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf")

# Merge data with the world map data
data0 <- merge(world, data_question2, by.x = "iso_a3", by.y = "code", all.x = TRUE)

data0 <- data0 %>%
  filter(!is.na(overallscore))

unique_years <- unique(data0$year)

plot_ly(
  type = "choropleth",
  z = ~data0$overallscore[data0$year == 2000],
  locations = ~data0$iso_a3[data0$year == 2000],
  text = ~paste("Country: ", data0$name[data0$year == 2000], "<br>Overall Score: ", data0$overallscore[data0$year == 2000]),
  colors = c("darkred", "orange", "yellow", "darkgreen"),
  colorbar = list(title = "Overall Score", cmin = 40, cmax = 87),
  zmin = 40,
  zmax = 87,
  hoverinfo = "text"
) %>%
  layout(
    title = "SDG overall score evolution",
    sliders = list(
      list(
        active = 0,
        currentvalue = list(prefix = "Year: "),
        steps = lapply(seq_along(unique_years), function(i) {
          year <- unique_years[i]
          list(
            label = as.character(year),
            method = "restyle",
            args = list(
              list(
                z = list(data0$overallscore[data0$year == year]),
                locations = list(data0$iso_a3[data0$year == year]),
                text = list(~paste("Country: ", data0$name[data0$year == year], "<br>Overall Score: ", data0$overallscore[data0$year == year]))
              )
            )
          )
        })
      )
    )
  )

Again, we see that the overall achievement score of the SDGs is increasing and that the countries that have the most red (bad score) are in Africa. However it is also there that it increases more rapidly. Our hypothesis is that when a score is very low, it is easier to make it better than when it becomes very high (around 90%) it may be hard to increase it, because it would mean perfection. In the next section, we will further investigate this idea.

4.2 Analysis: SDG adoption in 2015

We create one new variable per goal that captures the difference in SDG score between the year of the observation and the previous year. This will allow us to see how the countries improve (or not) on SDG scores each year. In addition, preparing for the specific question around 2015, we only keep the years from 2009 to 2022 (7 years before and after 2015).

Code
binary2015 <- data_question2 %>% 
  group_by(code) %>%
  mutate(across(5:21, ~ . - dplyr::lag(.), .names = "diff_{.col}")) %>%
  ungroup()

# Create a new column (binary variable) with value 1 if the year is after 2015 and zero otherwise. 
binary2015 <- binary2015 %>% 
  mutate(after2015 = ifelse(year > 2015, 1, 0)) %>%
  filter(as.numeric(year)>=2009)

We begin by looking at the distribution of the difference in SDG scores from one year to the next (improvement if it is above zero and deterioration if it is below zero).

Code
# histogram of difference in scores between years
unique_years <- unique(binary2015$year)
plot_ly() %>%
  add_trace(
    type = "histogram", 
    data = binary2015, 
    x = ~diff_overallscore[year == 2009],
    marker = list(color = "lightgreen", line = list(color = "black", width = 1))
  ) %>%
  layout(
    title = "Distribution of SDG evolution",
    xaxis = list(title = "Year difference SDG score", range = c(-3, 3)),
    yaxis = list(title = "Frequency", range = c(0, 40)),
    sliders = list(
      list(
        active = 0,
        currentvalue = list(prefix = "Year: "),
        steps = lapply(seq_along(unique_years), function(i) {
          year <- unique_years[i]
          list(
            label = as.character(year),
            method = "restyle",
            args = list(
              list(x = list(binary2015$diff_overallscore[binary2015$year == year]))
            )
          )
        })
      )
    )
  )

We notice that across the years, the distribution stays on the right of the x-axis, which means that there are more improvement than deterioration. If there is deterioration, it is less than one percent per year, except some extreme cases, for instance in 2013, there was almost a 3% decrease in the overall SDG score of one country. It is also rare to see improvements of more than 2% per year. Regarding our specific question, we do not see a major improvement of the distribution after 2015, if it was the case we would see the distribution going more to the right, but except for 2017, there are more and more values centered around zero, which means less score improvements overall.

After having visualized the improvements and declines of SDG overall score for the whole world, we are now interested in the top 5 countries in terms of improvement each year and we see that major improvement often comes from Sub-Saharan Africa countries or Middle East and North Africa. This confirms that more efforts are made in these regions to achieve better scores, but we also know from our previous visualizations that their initial scores are lower. Moreover, we record that the higher improvements are of 3% per year and were mostly achieved before 2015. Indeed, we can tell that in terms of maximum improvements, the adoption of SDGs in 2015 did not have a strong impact. We also notice that 2020 is the year with the smallest best improvements. We keep that in mind for the next question regarding events and specifically COVID.

Code
top_n_values <- 5

# Test with ggpot2
custom_colors <- c("blue", "darkblue", "cyan", "green", "darkgreen", "lightgreen", "lightblue","turquoise", "lightgrey", "darkgrey")

# Get unique regions in the dataset
unique_regions <- unique(binary2015$region)

# Create a color dictionary mapping each region to a specific color
region_colors <- setNames(custom_colors[1:length(unique_regions)], unique_regions)

library(patchwork)

plots <- list()

for (year in unique_years) {
  top_countries <- binary2015[binary2015$year == year, ] %>%
    arrange(desc(year), desc(diff_overallscore)) %>%
    head(n = top_n_values)
  
  plot <- ggplot(data = top_countries, mapping = aes(x = country, y = diff_overallscore, fill = region)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = region_colors) +  # Use the specified colors
    labs(title = paste(year), x = NULL, y = NULL) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size= 6), legend.position = "none", plot.title = element_text(size = 10)) + 
    scale_y_continuous(limits = c(0, 3))
  
  plots[[as.character(year)]] <- plot
}

# Arrange the plots in a 4x4 grid using patchwork
wrap <- wrap_plots(plots, ncol = 5)

wrap + plot_annotation(
  title = 'Best 5 countries in terms of SDG score improvement'
)

Code
# Create a common legend manually
legend_data <- data.frame(region = unique_regions)
legend_plot <- ggplot(legend_data, aes(x = region, fill = region)) +
  geom_bar(position = position_stack(reverse = TRUE)) +
  scale_fill_manual(values = region_colors) +
  labs(title = "Regions") +
  theme_void() +
  theme(
    legend.position = "none",
    axis.text.y = element_text(angle = 0, hjust = 1, size = 18),
    plot.title = element_text(size = 20, face = "bold")
  ) +
  coord_flip()

legend_plot

We continue by looking at the worst 5 countries in terms of decline in SDG overall score each year and we see that the years with the worst declines are those closer to us. Indeed the declines were generally no more than 1%, until 2018, where these became more frequent. We notice that the adoption of SDGs in 2015 may have had a good impact, because during the two years that follow, the worst SDG score declines were low (no more than 1% in 2016 and no more 0.5% in 2017). It was stabilizing, but it was of short duration, because then come the more extreme deteriotations. Interestingly, the regions that had were the worst in terms of decline during the past twelve years were very different, the only pattern appears during the last four years, where most of them are in Latin America and the Caribbean.

Code
plots <- list()

for (year in unique_years) {
  top_countries <- binary2015[binary2015$year == year, ] %>%
    arrange(desc(year), diff_overallscore) %>%
    head(n = top_n_values)
  
  plot <- ggplot(data = top_countries, mapping = aes(x = country, y = diff_overallscore, fill = region)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = region_colors) +  # Use the specified colors
    labs(title = paste(year), x = NULL, y = NULL) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size=6), legend.position = "none", plot.title = element_text(size = 10)) + 
    scale_y_continuous(limits = c(-3,0))
  
  plots[[as.character(year)]] <- plot
}

# Arrange the plots in a 4x4 grid using patchwork
wrap <- wrap_plots(plots, ncol = 5)

wrap + plot_annotation(
  title = 'Worst 5 countries in terms of SDG score improvement'
)

Code
legend_plot

We move on to the specific SDG scores and look at the 20 best improvements by score. We additionaly differentiate between the improvements than occurred before and after 2015. We want to see which goals get the best improvements and which countries put more effort into it.

Code
# Best improvements
data_long <- binary2015 %>%
  pivot_longer(cols = c(starts_with("diff_goal"), "diff_overallscore"),
               names_to = "goal", values_to = "improvement") %>%
  group_by(goal) %>%
  top_n(20, wt = improvement) %>%
  ungroup()

plot_ly() %>%
  add_trace(
    type = "bar",
    data = data_long,
    x = ~country[after2015 == 1 & goal == "diff_overallscore"],
    y = ~improvement[after2015 == 1 & goal == "diff_overallscore"],
    legendgroup = "after 2015",
    name = "after 2015",
    marker = list(color = "blue", size = 10),
    showlegend = TRUE
  ) %>%
  add_trace(
    type = "bar",
    x = ~country[after2015 == 0 & goal == "diff_overallscore"],
    y = ~improvement[after2015 == 0 & goal == "diff_overallscore"],
    legendgroup = "before 2015",
    name = "before 2015",
    marker = list(color = "red", size = 10),
    showlegend = TRUE
  ) %>%
  layout(
    title = paste("Top 20 countries per SDG Score evolution"),
    yaxis = list(title = "Year difference SDG score", range = c(0, 50)),
    xaxis = list(title = "Countries", categoryorder = "total ascending"),
    barmode = "stack",
    updatemenus = list(
      list(
        buttons = list(
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_overallscore"],
                  ~improvement[after2015 == 0 & goal == "diff_overallscore"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_overallscore"],
                  ~country[after2015 == 0 & goal == "diff_overallscore"]
                )
              )
            ),
            label = "Overall score",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal1"],
                  ~improvement[after2015 == 0 & goal == "diff_goal1"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal1"],
                  ~country[after2015 == 0 & goal == "diff_goal1"]
                )
              )
            ),
            label = "Goal 1",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal2"],
                  ~improvement[after2015 == 0 & goal == "diff_goal2"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal2"],
                  ~country[after2015 == 0 & goal == "diff_goal2"]
                )
              )
            ),
            label = "Goal 2",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal3"],
                  ~improvement[after2015 == 0 & goal == "diff_goal3"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal3"],
                  ~country[after2015 == 0 & goal == "diff_goal3"]
                )
              )
            ),
            label = "Goal 3",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal4"],
                  ~improvement[after2015 == 0 & goal == "diff_goal4"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal4"],
                  ~country[after2015 == 0 & goal == "diff_goal4"]
                )
              )
            ),
            label = "Goal 4",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal5"],
                  ~improvement[after2015 == 0 & goal == "diff_goal5"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal5"],
                  ~country[after2015 == 0 & goal == "diff_goal5"]
                )
              )
            ),
            label = "Goal 5",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal6"],
                  ~improvement[after2015 == 0 & goal == "diff_goal6"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal6"],
                  ~country[after2015 == 0 & goal == "diff_goal6"]
                )
              )
            ),
            label = "Goal 6",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal7"],
                  ~improvement[after2015 == 0 & goal == "diff_goal7"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal7"],
                  ~country[after2015 == 0 & goal == "diff_goal7"]
                )
              )
            ),
            label = "Goal 7",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal8"],
                  ~improvement[after2015 == 0 & goal == "diff_goal8"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal8"],
                  ~country[after2015 == 0 & goal == "diff_goal8"]
                )
              )
            ),
            label = "Goal 8",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal9"],
                  ~improvement[after2015 == 0 & goal == "diff_goal9"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal9"],
                  ~country[after2015 == 0 & goal == "diff_goal9"]
                )
              )
            ),
            label = "Goal 9",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal10"],
                  ~improvement[after2015 == 0 & goal == "diff_goal10"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal10"],
                  ~country[after2015 == 0 & goal == "diff_goal10"]
                )
              )
            ),
            label = "Goal 10",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal11"],
                  ~improvement[after2015 == 0 & goal == "diff_goal11"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal11"],
                  ~country[after2015 == 0 & goal == "diff_goal11"]
                )
              )
            ),
            label = "Goal 11",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal12"],
                  ~improvement[after2015 == 0 & goal == "diff_goal12"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal12"],
                  ~country[after2015 == 0 & goal == "diff_goal12"]
                )
              )
            ),
            label = "Goal 12",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal13"],
                  ~improvement[after2015 == 0 & goal == "diff_goal13"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal13"],
                  ~country[after2015 == 0 & goal == "diff_goal13"]
                )
              )
            ),
            label = "Goal 13",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal15"],
                  ~improvement[after2015 == 0 & goal == "diff_goal15"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal15"],
                  ~country[after2015 == 0 & goal == "diff_goal15"]
                )
              )
            ),
            label = "Goal 15",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal16"],
                  ~improvement[after2015 == 0 & goal == "diff_goal16"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal16"],
                  ~country[after2015 == 0 & goal == "diff_goal16"]
                )
              )
            ),
            label = "Goal 16",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal17"],
                  ~improvement[after2015 == 0 & goal == "diff_goal17"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal17"],
                  ~country[after2015 == 0 & goal == "diff_goal17"]
                )
              )
            ),
            label = "Goal 17",
            method = "restyle"
          )
        )
      )
    )
  )

We notice various patterns, among them:

  • Goals 2 (zero hunger), 3 (good health and well-being), 6 (clean water and sanitation), 8 (decent work and economic growth), 12 (responsible consumption and production), 16 (peace, justice and strong institutions) have very low improvements per year. Indeed, even the best ones are below 10%.

  • Goal 10 (reduced inequalities) has the best improvements, all 20 best improvements are above 20% and it goes up to 45%.

  • Some goals clearly had most of their best improvements before 2015: goals 3 (good health and well-being), 5 (gender equality), 6 (clean water and sanitation), 7 (affordable and clean energy).

  • Some goals clearly had most of their best improvements after 2015: goals 8 (decent work and economic growth), 12 (responsible consumption and production).

Regarding the impact of the adoption of SDGs in 2015, we can not tell that it had a positive impact, because there are not more big improvements after 2015 than before, even a little bit less. In addition, the most impressive improvements mostly occurred before 2015. These conclusions are supported by the next graph: we fit to different regression lines (before and after 2015) to see if there is a jump after the adoption and if the the SDG scores increase faster.

Code
# Graphs to show the jump (or not) in 2015

# Filter data
data_after_2015 <- filter(binary2015, as.numeric(year) >= 2015)
data_before_2016 <- filter(binary2015, as.numeric(year) <= 2015)

plotly::plot_ly() %>%
  plotly::add_trace(data = data_after_2015, x = ~year, y = ~fitted(lm(overallscore ~ year, data = data_after_2015)), type = 'scatter', mode = 'lines', line = list(color = 'blue'), name = "After 2015") %>%
  plotly::add_trace(data = data_before_2016, x = ~year, y = ~fitted(lm(overallscore ~ year, data = data_before_2016)), type = 'scatter', mode = 'lines', line = list(color = 'red'), name = "Before 2015") %>%
  plotly::layout(title = "Different patterns across SDGs before and after 2015",
         xaxis = list(title = "Year"),
         yaxis = list(title = "SDG achievement score", range = c(30, 85)),
         shapes = list(
           list(
             type = 'line',
             x0 = 2015,
             x1 = 2015,
             y0 = 0,
             y1 = 1,
             yref = 'paper',
             line = list(color = 'grey', width = 2, dash = 'dot')
           )
         ),
         updatemenus = list(
           list(
             buttons = list(
               list(
                 args = list("y", list(
                   ~fitted(lm(overallscore ~ year, data = data_after_2015)),
                   ~fitted(lm(overallscore ~ year, data = data_before_2016))
                 )),
                 label = "Overall score",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal1 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal1 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 1: \nno poverty",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal2 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal2 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 2: \nzero hunger",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal3 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal3 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 3: good health \nand well-being",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal4 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal4 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 4: \nquality education",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal5 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal5 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 5: \ngender equality",
                 method = "restyle"
               ), 
               list(
                 args = list("y", list(
                   ~fitted(lm(goal6 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal6 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 6: clean water \nand sanitation",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal7 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal7 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 7: affordable \nand clean energy",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal8 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal8 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 8: decent work \nand economic growth",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal9 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal9 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 9: industry, innovation \nand infrastructure",
                 method = "restyle"
               ), 
               list(
                 args = list("y", list(
                   ~fitted(lm(goal10 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal10 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 10: \nreduced inequalities",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal11 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal11 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 11: sustainable \ncities and communities",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal12 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal12 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 12: responsible \nconsumption and production",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal13 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal13 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 13: \nclimate action",
                 method = "restyle"
               ), 
               list(
                 args = list("y", list(
                   ~fitted(lm(goal15 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal15 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 15: \nlife on earth",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal16 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal16 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 16: peace, justice \nand strong institutions",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal17 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal17 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 17: partnerships \nfor the goals",
                 method = "restyle"
               )
             )
           )
         )
  )

Simple OLS regression on the difference between years of SDG scores:

Code
# Simple linear regression of the overall score on the difference in SDG scores variables "after2015"
library(huxtable)
reg2.1 <- lm(diff_overallscore ~ after2015, data=binary2015)
reg2.1.1 <- lm(diff_goal1 ~ after2015, data=binary2015)
reg2.1.2 <- lm(diff_goal2 ~ after2015, data=binary2015)
reg2.1.3 <- lm(diff_goal3 ~ after2015, data=binary2015)
reg2.1.4 <- lm(diff_goal4 ~ after2015, data=binary2015)
reg2.1.5 <- lm(diff_goal5 ~ after2015, data=binary2015)
reg2.1.6 <- lm(diff_goal6 ~ after2015, data=binary2015)
reg2.1.7 <- lm(diff_goal7 ~ after2015, data=binary2015)
reg2.1.8 <- lm(diff_goal8 ~ after2015, data=binary2015)
reg2.1.9 <- lm(diff_goal9 ~ after2015, data=binary2015)
reg2.1.10 <- lm(diff_goal10 ~ after2015, data=binary2015)
reg2.1.11 <- lm(diff_goal11 ~ after2015, data=binary2015)
reg2.1.12 <- lm(diff_goal12 ~ after2015, data=binary2015)
reg2.1.13 <- lm(diff_goal13 ~ after2015, data=binary2015)
reg2.1.15 <- lm(diff_goal15 ~ after2015, data=binary2015)
reg2.1.16 <- lm(diff_goal16 ~ after2015, data=binary2015)
reg2.1.17 <- lm(diff_goal17 ~ after2015, data=binary2015)

models_list1 <- list("Overall score"=reg2.1, "Goal 1"=reg2.1.1, "Goal 2"=reg2.1.2, "Goal 3"= reg2.1.3, "Goal 4"=reg2.1.4, "Goal 5"=reg2.1.5, "Goal 6"= reg2.1.6, "Goal 7"=reg2.1.7, "Goal 8"=reg2.1.8, "Goal 9"=reg2.1.9, "Goal 10"=reg2.1.10, "Goal 11" = reg2.1.11, "Goal 12"=reg2.1.12, "Goal 13"=reg2.1.13, "Goal 15" =reg2.1.15, "Goal 16"=reg2.1.16, "Goal 17"=reg2.1.17)

huxreg(models_list1[1:9])
Overall score Goal 1 Goal 2 Goal 3 Goal 4 Goal 5 Goal 6 Goal 7 Goal 8
(Intercept) 0.396 *** 0.528 *** 0.264 *** 0.763 *** 0.585 *** 0.853 *** 0.272 *** 0.512 *** 0.083 *  
(0.017)    (0.063)    (0.049)    (0.033)    (0.086)    (0.053)    (0.013)    (0.047)    (0.032)   
after2015 -0.073 **  -0.197 *   -0.121     -0.432 *** -0.277 *   -0.305 *** -0.121 *** -0.265 *** 0.179 ***
(0.024)    (0.089)    (0.069)    (0.047)    (0.121)    (0.074)    (0.018)    (0.066)    (0.046)   
N 2324         2114         2324         2324         2324         2324         2324         2324         2324        
R2 0.004     0.002     0.001     0.035     0.002     0.007     0.018     0.007     0.007    
logLik -2027.219     -4521.177     -4477.397     -3603.060     -5785.705     -4652.486     -1407.912     -4370.917     -3512.847    
AIC 4060.438     9048.355     8960.793     7212.120     11577.411     9310.972     2821.823     8747.834     7031.694    
*** p < 0.001; ** p < 0.01; * p < 0.05.
Code
huxreg(models_list1[10:17])
Goal 9 Goal 10 Goal 11 Goal 12 Goal 13 Goal 15 Goal 16 Goal 17
(Intercept) 1.495 *** 0.488 *** 0.246 ** 0.049 * 0.075  0.204 *** 0.076     0.077    
(0.066)    (0.131)    (0.078)   (0.023)  (0.041) (0.043)    (0.047)    (0.062)   
after2015 -0.040     -0.149     -0.067    0.003   0.002  -0.207 *** -0.279 *** 0.740 ***
(0.093)    (0.186)    (0.111)   (0.033)  (0.058) (0.061)    (0.066)    (0.088)   
N 2324         2086         2324        2324       2324      2324         2324         2324        
R2 0.000     0.000     0.000    0.000   0.000  0.005     0.008     0.029    
logLik -5166.416     -5974.324     -5581.016    -2730.794   -4082.237  -4195.544     -4374.902     -5046.575    
AIC 10338.833     11954.647     11168.033    5467.588   8170.474  8397.088     8755.804     10099.151    
*** p < 0.001; ** p < 0.01; * p < 0.05.

DiD using panel data:

Code
# Create a panel data object
panel_data <- plm::pdata.frame(binary2015, index = c("country", "year"))

# Run the difference-in-differences model to take into account the general evolution over the years
reg2.2 <- plm::plm(diff_overallscore ~ after2015 + year + after2015:year, 
                 data = panel_data,
                 model = "within")
reg2.2.1 <- plm::plm(diff_goal1 ~ after2015 + year + after2015:year, 
              data = panel_data,
              model = "within")
reg2.2.2 <- plm::plm(diff_goal2 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.3 <- plm::plm(diff_goal3 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.4 <- plm::plm(diff_goal4 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.5 <- plm::plm(diff_goal5 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.6 <- plm::plm(diff_goal6 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.7 <- plm::plm(diff_goal7 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.8 <- plm::plm(diff_goal8 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.9 <- plm::plm(diff_goal9 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.10 <- plm::plm(diff_goal10 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.11 <- plm::plm(diff_goal11 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.12 <- plm::plm(diff_goal12 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.13 <- plm::plm(diff_goal13 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.15 <- plm::plm(diff_goal15 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.16 <- plm::plm(diff_goal16 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.17 <- plm::plm(diff_goal17 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")

# Create a list of your regression models
models_list2 <- list("Overall score"=reg2.2, "Goal 1"=reg2.2.1, "Goal 2"=reg2.2.2, "Goal 3"= reg2.2.3, "Goal 4"=reg2.2.4, "Goal 5"=reg2.2.5, "Goal 6"= reg2.2.6, "Goal 7"=reg2.2.7, "Goal 8"=reg2.2.8, "Goal 9"=reg2.2.9, "Goal 10"=reg2.2.10, "Goal 11" = reg2.2.11,"Goal 12"=reg2.2.12, "Goal 13"=reg2.2.13, "Goal 15" =reg2.2.15, "Goal 16"=reg2.2.16, "Goal 17"=reg2.2.17)

huxreg(models_list2[1:9])
Overall score Goal 1 Goal 2 Goal 3 Goal 4 Goal 5 Goal 6 Goal 7 Goal 8
after2015 -0.281 *** 0.951 *** -0.475 *   -0.718 *** -0.355   -0.360     -0.320 *** -0.672 *** 0.740 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2010 0.007     0.000     -0.554 **  1.065 *** 0.343   0.059     -0.094 *   -0.214     0.284 ** 
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2011 -0.068     0.860 *** -0.117     -0.135     0.432   -0.031     -0.037     -0.396 *   0.421 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2012 0.069     0.754 *** -0.590 **  -0.163     0.436   0.249     -0.080     -0.011     0.431 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2013 -0.018     0.983 *** -0.233     -0.287 *   0.051   1.304 *** -0.052     -0.072     0.393 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2014 0.158 *   0.489 *   0.072     -0.275 *   0.699 * 0.131     -0.061     -0.402 *   1.422 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2015 0.008     0.608 **  -0.719 *** 0.326 **  0.157   0.196     -0.009     -0.026     0.505 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2016 0.211 *** -0.587 **  -0.155     0.413 *** 0.537   0.309     0.238 *** 0.408 *   -0.237 *  
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2017 0.519 *** -0.475 *   0.049     0.888 *** 0.567   0.484 *   0.251 *** 0.487 **  0.794 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2018 0.222 *** -0.507 *   0.219     0.491 *** 0.761 * 0.269     0.217 *** 0.313     -0.181    
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2019 0.293 *** -0.744 *** 0.287     0.852 *** 0.325   0.590 **  0.169 *** 0.237     -0.235 *  
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2020 0.121 *   -1.752 *** -0.040     -0.143     0.419   0.213     0.181 *** 0.283     -1.121 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
year2021 0.243 *** -0.278     -0.022     0.034     0.058   0.434 *   -0.000     -0.000     0.509 ***
(0.061)    (0.212)    (0.184)    (0.114)    (0.320)  (0.195)    (0.042)    (0.173)    (0.109)   
N 2324         2114         2324         2324         2324       2324         2324         2324         2324        
R2 0.048     0.062     0.021     0.182     0.010   0.043     0.064     0.020     0.219    
*** p < 0.001; ** p < 0.01; * p < 0.05.
Code
huxreg(models_list2[10:17])
Goal 9 Goal 10 Goal 11 Goal 12 Goal 13 Goal 15 Goal 16 Goal 17
after2015 -0.652 **  -1.093 * -0.437     -0.308 *** -0.404 **  -0.392 *   -1.160 *** -0.030    
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2010 0.177     -0.485   0.123     -0.350 *** -0.689 *** 0.275     0.657 *** -0.457 *  
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2011 0.485 *   -0.798   -0.878 **  -0.448 *** -0.772 *** -0.112     0.404 *   -0.277    
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2012 1.370 *** -1.028 * 0.303     -0.235 **  -0.210     -0.164     0.208     -0.319    
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2013 0.815 *** -0.782   -0.550     -0.392 *** -0.431 **  -0.067     -0.458 **  -0.249    
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2014 0.905 *** -0.829   0.714 *   -0.261 **  -0.244     -0.164     0.844 *** -0.236    
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2015 0.747 *** -0.315   -0.573     -0.128     0.044     -0.239     0.457 **  -0.787 ***
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2016 0.686 **  0.356   1.270 *** -0.037     0.361 *   0.219     0.946 *** -0.461 *  
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2017 3.478 *** 0.717   0.351     -0.061     -0.014     0.298     1.397 *** 0.427    
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2018 1.054 *** 0.437   0.168     -0.038     -0.204     -0.737 *** 1.875 *** 0.585 *  
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2019 1.189 *** 0.602   0.108     -0.072     0.146     0.193     1.467 *** 0.008    
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2020 1.220 *** 0.271   -0.069     0.015     0.440 **  0.169     1.510 *** 1.320 ***
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
year2021 1.163 *** -0.007   -0.103     0.555 *** -0.185     0.681 *** 1.083 *** 1.183 ***
(0.223)    (0.497)  (0.295)    (0.084)    (0.153)    (0.158)    (0.168)    (0.231)   
N 2324         2086       2324         2324         2324         2324         2324         2324        
R2 0.133     0.005   0.034     0.056     0.038     0.051     0.106     0.078    
*** p < 0.001; ** p < 0.01; * p < 0.05.

5 Analysis

5.1 Answers to the research questions

5.1.1 Focus on relationship between SDGs

How are the different SDGs linked? (We want to see if some SDGs are linked in the fact that a high score on one implies a high score on the other, and thus if we can make groups of SDGs that are comparable in that way).

5.2 Focus on relationship between SDGs

Let’s analyse our relationship between the SDGs. For that, we’ll import our dataset to examine the interconnections among the Sustainable Development Goals (SDGs). After importing, we’ll focus specifically on the columns representing the goals of interest. To provide a comprehensive analysis, we will construct a correlation matrix, highlighting only those goals where the correlation coefficient is either greater than 0.5 (indicating a strong positive relationship) or less than -0.5 (signifying a strong negative relationship). This approach will enable us to identify and analyze the most significant relationships between the selected SDGs.

Code
data_4 <- read.csv(here::here("scripts", "data", "data_question24.csv"))
goals_data_4_cl <- na.omit(data_4, cols=c("goal1", "goal10"))
goals_data_4_cl <- goals_data_4_cl[, grepl("goal", names(goals_data_4_cl))]

Given that our variables do not follow a normal distribution, employing the Pearson correlation method is not suitable in our analysis. We attempted to normalize the data through logarithmic and square root transformations, but these adjustments were insufficiently effective. Consequently, we will resort to computing the Spearman correlation. While not ideal, this method does not necessitate the normal distribution of our data. In our analysis, particularly for the heatmap visualization, we will focus on correlations that exceed the threshold of r threshold_heatmap or fall below -r threshold_heatmap. This selective approach will enhance the readability and interpretability of the heatmap.

Code
spearman_corr_4_cl <- cor(goals_data_4_cl, method = "spearman", use = "everything")
spearman_corr_4_cl[abs(spearman_corr_4_cl) < threashold_heatmap] <- NA

We can then plot the Heatmap of the Spearman correlation using the ggplot2 package.

Code
# Melting the data
melted_corr_4 <- melt(spearman_corr_4_cl, na.rm = TRUE)

# Creating the heatmap
ggplot(data = melted_corr_4, aes(x = Var1, y = Var2, fill = value)) +
    geom_tile() +
    geom_text(aes(label = sprintf("%.2f", value)), vjust = 0.5, size=2.5) + # Adding text
    scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                         midpoint = 0, limit = c(-1,1), space = "Lab", 
                         name="Spearman\nCorrelation",
                         na.value = "grey") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    labs(title = "Heatmap of Spearman Correlations for Goals", 
         x = "", y = "")

It is evident that the Sustainable Development Goals (SDGs) are intricately interconnected. However, certain goals appear to be less interrelated compared to others. Specifically, SDG 1 (No Poverty) and SDG 10 (Reduced Inequalities) demonstrate a weaker correlation with the rest of the goals. Similarly, Goal 15 (Life on Land) also exhibits a lesser degree of interconnection with the other SDGs.

Code
# Selecting only numeric columns, assuming they are named as 'goal1', 'goal2', etc.
goals_data <- goals_data_4_cl[, grep('goal', names(goals_data_4_cl))]
goals_data_scaled <- scale(goals_data) # Scaling the data
pca_result <- prcomp(goals_data_scaled) # Running PCA

# Plotting Scree plot to visualize the importance of each principal component
fviz_eig(pca_result,
         addlabels = TRUE) +
  theme_minimal()

# Plotting Biplot to visualize the two main PCs
fviz_pca_biplot(pca_result,
                label="var",
                col.var="dodgerblue3",
                geom="point",
                pointsize = 0.1,
                labelsize = 5) +
  theme_minimal()

In our EDA on the focus on the influence of the factors over the SDG scores, we had made a correlation matrix heatmap that took into account every variable of our dataset. Here, we tried to zoom on certain parts of the heatmap. We have decided to add on our graphs the correlations between variables when our pvalue was significant (alpha = 0.05). The grey zones are concerning our non-significant pvalues.

Let’s see first the correlation matrix heatmap regarding our SDG goals and all our variables different than our SDG goals.

Code
corr_matrix <- cor(data_question1[7:40])
p_matrix2 <- matrix(nrow = ncol(data_question1[7:40]), ncol = ncol(data_question1[7:40]))
for (i in 1:ncol(data_question1[7:40])) {
  for (j in 1:ncol(data_question1[7:40])) {
    test_result <- cor.test(data_question1[7:40][, i], data_question1[7:40][, j])
    p_matrix2[i, j] <- test_result$p.value
  }
}

#Switch population at the end of heatmap
corr_matrix[which(p_matrix2 > 0.05)] <- NA
melted_corr_matrix_GVar <- melt(corr_matrix[19:34,1:18])
ggplot(melted_corr_matrix_GVar, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = ifelse(!is.na(value), sprintf("%.2f", value), '')),
            color = "black", size = 2) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Pearson\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(angle = 45, hjust = 1)) +
  labs(x = 'Goals', y = 'Goals',
       title = 'Correlations Heatmap between goals and our other variables')

As we can see, our SDG goals 12 & 13 (responsible consumption & production and climate action) are negatively correlated with most of our variables, as is the economic freedom government variable to our SDG goals. In that sens, we could understand it as having a higher Human Freedom Index Score would influence more negatively the SDG scores of these two goals, i.e. the more people in a country can access and afford civil justice, the more it impact negatively the score of these two SDG goals.

Nevertheless, goals 12 & 13 and ef_government are positively correlated together. In addition, some variables such as internet_usage, pf_law or ef_legal are strongely correlated with most of our SDG goals. This is mostly due to the large scope englobed in these variables. That makes them influence various sectors of our economies and thus, mostly impacting all our SDG goals.

Now let’s zoom on the correlations between all our variables except our SDG goals: ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-264_24a0efed5c5d2bfdf4f1ce33488756a8’}

Code
melted_corr_matrix_Var <- melt(corr_matrix[19:34,19:34])
ggplot(melted_corr_matrix_Var, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = ifelse(!is.na(value), sprintf("%.2f", value), '')),
            color = "black", size = 1.7) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Pearson\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(angle = 45, hjust = 1)) +
  labs(x = 'Goals', y = 'Goals',
       title = 'Correlations Heatmap between other variables than SDG goals')

:::

Code
#### PCA ####

# for goals
myPCA_g <- PCA(data_question1[,9:24], graph = FALSE)
plot(myPCA_g$ind$coord[, 1], myPCA_g$ind$coord[, 2], xlab = "PC1", ylab = "PC2", main = "PCA Plot SDG Goals", pch = 19, col = "blue", cex = 0.5) + 
  abline(h = 0, col = "red", lty = 2) + 
  abline(v = 0, col = "red", lty = 2)
#> integer(0)
plot.PCA(myPCA_g, choix = "var", pch = 10, cex = 0.6)

summary(myPCA_g)
#> 
#> Call:
#> PCA(X = data_question1[, 9:24], graph = FALSE) 
#> 
#> 
#> Eigenvalues
#>                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6
#> Variance              10.128   1.458   0.885   0.716   0.706   0.502
#> % of var.             63.300   9.111   5.530   4.474   4.409   3.135
#> Cumulative % of var.  63.300  72.411  77.942  82.415  86.825  89.960
#>                        Dim.7   Dim.8   Dim.9  Dim.10  Dim.11  Dim.12
#> Variance               0.294   0.275   0.252   0.198   0.150   0.129
#> % of var.              1.840   1.721   1.578   1.235   0.940   0.803
#> Cumulative % of var.  91.800  93.521  95.099  96.334  97.274  98.077
#>                       Dim.13  Dim.14  Dim.15  Dim.16
#> Variance               0.100   0.087   0.063   0.057
#> % of var.              0.627   0.543   0.397   0.356
#> Cumulative % of var.  98.704  99.247  99.644 100.000
#> 
#> Individuals (the 10 first)
#>            Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
#> 1      |  2.872 | -0.423  0.001  0.022 |  0.260  0.002  0.008 |
#> 2      |  2.870 | -0.430  0.001  0.022 |  0.263  0.002  0.008 |
#> 3      |  2.869 | -0.443  0.001  0.024 |  0.218  0.001  0.006 |
#> 4      |  2.942 | -0.451  0.001  0.023 |  0.175  0.001  0.004 |
#> 5      |  2.922 | -0.344  0.001  0.014 |  0.059  0.000  0.000 |
#> 6      |  2.946 | -0.300  0.000  0.010 | -0.011  0.000  0.000 |
#> 7      |  2.891 | -0.289  0.000  0.010 | -0.031  0.000  0.000 |
#> 8      |  2.842 | -0.275  0.000  0.009 | -0.075  0.000  0.001 |
#> 9      |  2.715 | -0.078  0.000  0.001 | -0.073  0.000  0.001 |
#> 10     |  2.665 |  0.033  0.000  0.000 | -0.071  0.000  0.001 |
#>         Dim.3    ctr   cos2  
#> 1      -0.550  0.015  0.037 |
#> 2      -0.522  0.014  0.033 |
#> 3      -0.532  0.014  0.034 |
#> 4      -0.464  0.011  0.025 |
#> 5      -0.351  0.006  0.014 |
#> 6      -0.397  0.008  0.018 |
#> 7      -0.416  0.009  0.021 |
#> 8      -0.396  0.008  0.019 |
#> 9      -0.517  0.014  0.036 |
#> 10     -0.358  0.007  0.018 |
#> 
#> Variables (the 10 first)
#>           Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr
#> goal1  |  0.864  7.367  0.746 |  0.205  2.885  0.042 | -0.166  3.096
#> goal2  |  0.665  4.361  0.442 |  0.166  1.886  0.028 | -0.282  9.009
#> goal3  |  0.944  8.797  0.891 |  0.169  1.952  0.028 | -0.112  1.426
#> goal4  |  0.862  7.329  0.742 |  0.293  5.890  0.086 | -0.025  0.073
#> goal5  |  0.737  5.356  0.542 |  0.056  0.216  0.003 |  0.422 20.135
#> goal6  |  0.904  8.074  0.818 |  0.088  0.529  0.008 |  0.029  0.098
#> goal7  |  0.864  7.362  0.746 |  0.287  5.658  0.082 | -0.037  0.154
#> goal8  |  0.822  6.674  0.676 | -0.155  1.655  0.024 | -0.105  1.243
#> goal9  |  0.901  8.008  0.811 | -0.131  1.178  0.017 | -0.022  0.055
#> goal10 |  0.566  3.166  0.321 | -0.544 20.333  0.296 | -0.446 22.518
#>          cos2  
#> goal1   0.027 |
#> goal2   0.080 |
#> goal3   0.013 |
#> goal4   0.001 |
#> goal5   0.178 |
#> goal6   0.001 |
#> goal7   0.001 |
#> goal8   0.011 |
#> goal9   0.000 |
#> goal10  0.199 |
myPCA_g$eig
#>         eigenvalue percentage of variance
#> comp 1     10.1280                 63.300
#> comp 2      1.4578                  9.111
#> comp 3      0.8849                  5.530
#> comp 4      0.7158                  4.474
#> comp 5      0.7055                  4.409
#> comp 6      0.5017                  3.135
#> comp 7      0.2944                  1.840
#> comp 8      0.2754                  1.721
#> comp 9      0.2524                  1.578
#> comp 10     0.1976                  1.235
#> comp 11     0.1504                  0.940
#> comp 12     0.1285                  0.803
#> comp 13     0.1003                  0.627
#> comp 14     0.0869                  0.543
#> comp 15     0.0635                  0.397
#> comp 16     0.0570                  0.356
#>         cumulative percentage of variance
#> comp 1                               63.3
#> comp 2                               72.4
#> comp 3                               77.9
#> comp 4                               82.4
#> comp 5                               86.8
#> comp 6                               90.0
#> comp 7                               91.8
#> comp 8                               93.5
#> comp 9                               95.1
#> comp 10                              96.3
#> comp 11                              97.3
#> comp 12                              98.1
#> comp 13                              98.7
#> comp 14                              99.2
#> comp 15                              99.6
#> comp 16                             100.0

Concerning the SDG goals, we conclude that most of our variables are going along the 1st component, except the goals 10 and 15 that are rather uncorrelated with the dimension 1. In addition, as seen before, the goals 12 and 13 are negatively correlated to the other goals. With a eigenvalue bigger than 1 for the first two components, we conclude that there are only 2 dimensions to take into account, according to the Kaiser-Guttman’s rule. Nevertheless, they are explaining less than 80% of cumulated variance.

Code
#for HFI scores
myPCA_s <- PCA(data_question1[,29:40], graph = FALSE)
plot(myPCA_s$ind$coord[, 1], myPCA_s$ind$coord[, 2], xlab = "PC1", ylab = "PC2", main = "PCA Plot HFI Scores", pch = 19, col = "blue", cex = 0.5) + 
  abline(h = 0, col = "red", lty = 2) + 
  abline(v = 0, col = "red", lty = 2)
#> integer(0)
plot.PCA(myPCA_s, choix = "var",cex = 0.5)
summary(myPCA_s)
#> 
#> Call:
#> PCA(X = data_question1[, 29:40], graph = FALSE) 
#> 
#> 
#> Eigenvalues
#>                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6
#> Variance               6.710   1.577   1.014   0.731   0.507   0.419
#> % of var.             55.915  13.140   8.453   6.093   4.222   3.491
#> Cumulative % of var.  55.915  69.055  77.507  83.601  87.823  91.314
#>                        Dim.7   Dim.8   Dim.9  Dim.10  Dim.11  Dim.12
#> Variance               0.287   0.218   0.192   0.168   0.106   0.070
#> % of var.              2.395   1.820   1.602   1.402   0.882   0.585
#> Cumulative % of var.  93.710  95.530  97.132  98.533  99.415 100.000
#> 
#> Individuals (the 10 first)
#>                   Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2
#> 1             |  2.143 | -0.207  0.000  0.009 |  1.261  0.045  0.346
#> 2             |  2.085 | -0.135  0.000  0.004 |  1.325  0.050  0.404
#> 3             |  2.413 |  0.027  0.000  0.000 |  1.656  0.078  0.471
#> 4             |  2.529 |  0.530  0.002  0.044 |  1.430  0.058  0.320
#> 5             |  2.416 |  0.364  0.001  0.023 |  1.272  0.046  0.277
#> 6             |  2.277 |  0.378  0.001  0.028 |  1.146  0.037  0.253
#> 7             |  2.320 |  0.613  0.003  0.070 |  1.196  0.041  0.266
#> 8             |  2.605 |  0.726  0.004  0.078 |  1.614  0.074  0.384
#> 9             |  2.335 |  0.850  0.005  0.132 |  1.287  0.047  0.304
#> 10            |  2.183 |  0.909  0.006  0.173 |  0.982  0.027  0.202
#>                  Dim.3    ctr   cos2  
#> 1             | -0.542  0.013  0.064 |
#> 2             | -0.253  0.003  0.015 |
#> 3             |  0.176  0.001  0.005 |
#> 4             |  0.990  0.043  0.153 |
#> 5             |  0.579  0.015  0.057 |
#> 6             |  0.341  0.005  0.022 |
#> 7             |  0.494  0.011  0.045 |
#> 8             |  0.411  0.007  0.025 |
#> 9             |  0.292  0.004  0.016 |
#> 10            |  0.214  0.002  0.010 |
#> 
#> Variables (the 10 first)
#>                  Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
#> pf_law        |  0.871 11.310  0.759 | -0.301  5.732  0.090 | -0.110
#> pf_security   |  0.578  4.984  0.334 | -0.446 12.630  0.199 | -0.208
#> pf_movement   |  0.837 10.432  0.700 |  0.282  5.028  0.079 | -0.148
#> pf_religion   |  0.704  7.392  0.496 |  0.537 18.285  0.288 | -0.299
#> pf_assembly   |  0.839 10.482  0.703 |  0.404 10.343  0.163 | -0.206
#> pf_expression |  0.890 11.814  0.793 |  0.171  1.855  0.029 | -0.241
#> pf_identity   |  0.668  6.650  0.446 | -0.007  0.003  0.000 |  0.034
#> ef_government | -0.154  0.354  0.024 |  0.779 38.445  0.606 |  0.435
#> ef_legal      |  0.871 11.314  0.759 | -0.302  5.791  0.091 |  0.052
#> ef_money      |  0.690  7.104  0.477 | -0.128  1.047  0.017 |  0.544
#>                  ctr   cos2  
#> pf_law         1.189  0.012 |
#> pf_security    4.245  0.043 |
#> pf_movement    2.164  0.022 |
#> pf_religion    8.814  0.089 |
#> pf_assembly    4.167  0.042 |
#> pf_expression  5.703  0.058 |
#> pf_identity    0.113  0.001 |
#> ef_government 18.631  0.189 |
#> ef_legal       0.262  0.003 |
#> ef_money      29.130  0.295 |
myPCA_s$eig
#>         eigenvalue percentage of variance
#> comp 1      6.7098                 55.915
#> comp 2      1.5768                 13.140
#> comp 3      1.0143                  8.453
#> comp 4      0.7312                  6.093
#> comp 5      0.5067                  4.222
#> comp 6      0.4189                  3.491
#> comp 7      0.2874                  2.395
#> comp 8      0.2184                  1.820
#> comp 9      0.1922                  1.602
#> comp 10     0.1682                  1.402
#> comp 11     0.1058                  0.882
#> comp 12     0.0702                  0.585
#>         cumulative percentage of variance
#> comp 1                               55.9
#> comp 2                               69.1
#> comp 3                               77.5
#> comp 4                               83.6
#> comp 5                               87.8
#> comp 6                               91.3
#> comp 7                               93.7
#> comp 8                               95.5
#> comp 9                               97.1
#> comp 10                              98.5
#> comp 11                              99.4
#> comp 12                             100.0

Now concerning the Human Freedom Index scores, most of the variables are positively correlated to the dimension 1, slightly less for the PF religion and security, and finaly the EF government variable is uncorrelated to the dimension 1. With a eigenvalue bigger than 1 for the three first components, we conclude that there are 3 dimensions to take into account. Nevertheless, again, they are explaining less than 80% of cumulated variance.

Code
#### Kmean clustering ####

data1_scaled <- scale(Correlation_overall)
rownames(data1_scaled) <- seq_along(row.names(data1_scaled))
fviz_nbclust(data1_scaled, kmeans, method="wss")
kmean <- kmeans(data1_scaled, 7, nstart = 25)
print(kmean)
#> K-means clustering with 7 clusters of sizes 42, 437, 364, 176, 255, 546, 406
#> 
#> Cluster means:
#>   population overallscore  goal1   goal2   goal3   goal4  goal5
#> 1     6.8684      -0.4244 -0.249  0.5022 -0.1391  0.4264 -0.247
#> 2    -0.2419       0.8385  0.766  0.5711  0.7861  0.7570  0.380
#> 3    -0.0783       1.1923  0.791  0.8288  1.1190  0.8217  1.156
#> 4    -0.0894      -1.5887 -1.550 -1.4207 -1.5960 -1.6346 -1.068
#> 5    -0.0581       0.1402  0.538 -0.0692  0.2117 -0.0244 -0.688
#> 6    -0.1222      -0.0661  0.178 -0.1801  0.0317  0.2807  0.195
#> 7    -0.1402      -1.2380 -1.414 -0.5083 -1.3186 -1.2491 -0.787
#>     goal6  goal7  goal8   goal9 goal10 goal11 goal12 goal13 goal15
#> 1 -0.6983 -0.211 -0.067  0.0598 -0.762 -0.784  0.768  0.392 -1.420
#> 2  0.7100  0.764  0.679  0.5920  0.678  0.716 -0.671 -0.401  0.615
#> 3  1.2613  0.833  1.398  1.6099  1.034  1.013 -1.679 -1.708  0.437
#> 4 -1.2696 -1.514 -1.543 -1.1043 -0.415 -1.565  0.966  0.809  0.411
#> 5 -0.2680  0.262 -0.493 -0.1540  0.255  0.111  0.459  0.284 -0.501
#> 6  0.0575  0.178 -0.334 -0.4117 -0.868  0.194  0.331  0.425 -0.495
#> 7 -1.1813 -1.295 -0.550 -0.9577 -0.391 -1.250  0.996  0.821 -0.105
#>    goal16   goal17 unemployment.rate GDPpercapita
#> 1 -0.7164 -1.15596            -0.239       -0.533
#> 2  0.7276 -0.00469             0.393        0.190
#> 3  1.4596  0.95486            -0.301        1.883
#> 4 -1.1880 -1.14051            -0.414       -0.654
#> 5 -0.0977  0.54351             0.346       -0.467
#> 6 -0.4037  0.21920             0.249       -0.462
#> 7 -0.8985 -0.87320            -0.501       -0.639
#>   MilitaryExpenditurePercentGDP internet_usage pf_law pf_security
#> 1                         0.475        -0.4703 -0.656      0.0085
#> 2                         0.110         0.6438  0.744      0.7142
#> 3                        -0.266         1.3222  1.513      0.9034
#> 4                         0.471        -0.9943 -1.195     -1.1843
#> 5                         1.151        -0.0942 -0.587     -0.1417
#> 6                        -0.252        -0.2986 -0.380     -0.6332
#> 7                        -0.518        -0.9381 -0.692     -0.1257
#>   pf_movement pf_religion pf_assembly pf_expression pf_identity
#> 1      -1.401      -2.227      -1.563      -0.81722       0.121
#> 2       0.614       0.478       0.658       0.65071       0.723
#> 3       0.850       0.678       0.859       1.21634       0.867
#> 4      -1.514      -0.850      -1.491      -1.36775      -0.590
#> 5      -1.210      -1.809      -1.473      -1.27219      -0.963
#> 6       0.313       0.383       0.297      -0.00344       0.256
#> 7      -0.283       0.097      -0.144      -0.30978      -1.052
#>   ef_government ef_legal ef_money ef_trade ef_regulation
#> 1        -0.573   -0.135  -0.3574  -0.7681      -0.70500
#> 2        -0.220    0.577   0.6741   0.8318       0.40091
#> 3        -0.743    1.614   0.9280   0.9810       1.06502
#> 4        -0.183   -1.378  -1.4923  -1.7207      -1.36482
#> 5        -0.325   -0.477  -0.3958  -0.5407      -0.44812
#> 6         0.676   -0.266   0.0399   0.0478      -0.00421
#> 7         0.336   -0.799  -0.6787  -0.6741      -0.43467
#> 
#> Clustering vector:
#>    [1] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 2 2 2 2 2 2 6 6 6 6 6 6 6 6 6 6
#>   [32] 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>   [63] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5
#>   [94] 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>  [125] 4 4 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 7 7 7 7 7 7 7 7
#>  [156] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#>  [187] 7 5 5 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 6 6 6 6 6 6
#>  [218] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#>  [249] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4
#>  [280] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [311] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2
#>  [342] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [373] 1 1 1 1 1 1 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 4 4 4 4 4 4 4 4 4
#>  [404] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>  [435] 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 2 2 2
#>  [466] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#>  [497] 2 2 2 2 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [528] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 6 6 6 6 6 6 6 6 6 6 6
#>  [559] 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6
#>  [590] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5
#>  [621] 5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#>  [652] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [683] 3 3 3 3 3 3 3 3 3 3 3 6 6 6 6 6 6 6 5 5 5 5 5 5 5 6 6 6 6 6 6
#>  [714] 6 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 7 7 7 7 7 7 7 7 7
#>  [745] 7 7 7 7 6 6 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [776] 3 3 6 6 6 6 6 6 6 5 5 5 6 6 6 6 2 2 2 2 2 2 2 7 7 7 7 7 7 7 7
#>  [807] 7 7 7 7 7 7 7 7 7 7 6 6 6 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#>  [838] 2 2 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#>  [869] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#>  [900] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 7 7 7 6 6 6
#>  [931] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [962] 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5
#>  [993] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [1024] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 6 6 6
#> [1055] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> [1086] 5 5 5 5 5 5 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 7 7 7
#> [1117] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [1148] 2 2 2 2 2 2 3 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 2 2
#> [1179] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3
#> [1210] 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5
#> [1241] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 7 7
#> [1272] 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#> [1303] 6 5 6 6 6 6 6 6 6 6 6 2 2 2 2 6 6 2 2 2 2 7 7 7 7 7 7 7 7 7 7
#> [1334] 7 7 7 7 7 7 7 7 7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [1365] 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6
#> [1396] 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#> [1427] 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7
#> [1458] 7 7 7 7 7 7 7 7 7 7 7 7 7 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> [1489] 6 6 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 4 7 7
#> [1520] 7 7 4 7 7 7 7 7 7 7 7 7 7 7 4 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#> [1551] 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 3 3 3 3 3 3
#> [1582] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [1613] 3 3 3 3 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 4 4 4 4 4
#> [1644] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#> [1675] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7
#> [1706] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [1737] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 6 6
#> [1768] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 2 2 2 2 2 2
#> [1799] 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4
#> [1830] 4 4 4 7 7 7 7 7 7 7 7 7 7 7 5 5 5 5 5 7 7 7 7 7 7 7 7 7 7 7 7
#> [1861] 7 7 7 7 7 7 7 7 7 4 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6
#> [1892] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2
#> [1923] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3
#> [1954] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4
#> [1985] 4 4 4 4 4 4 4 4 4 4 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#> [2016] 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 6 5 5 5 5 5 5 5 5 5 5
#> [2047] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 5 5 5 5 5 5
#> [2078] 5 5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#> [2109] 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2
#> [2140] 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 6 6 6 6 6 6
#> [2171] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#> [2202] 7 7 7 7 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7 4 4 4 4 4
#> 
#> Within cluster sum of squares by cluster:
#> [1]  770 4373 2889 2926 4117 8345 4653
#>  (between_SS / total_SS =  62.9 %)
#> 
#> Available components:
#> 
#> [1] "cluster"      "centers"      "totss"        "withinss"    
#> [5] "tot.withinss" "betweenss"    "size"         "iter"        
#> [9] "ifault"
fviz_cluster(kmean, data=data1_scaled, repel=FALSE, depth =NULL, ellipse.type = "norm", labelsize = 0, pointsize = 0.5)

### NOW CLUSTERING BY COUNTRY? AND TAKE MEAN OF EVERY VARIABLE ON EVERY CONCERNED YEAR?

Due to the large number of data, the visualization of the clusters using the kmean method is not really relevant. In addition, by clustering our data, we are trying to get group that differ from eachother but with little variation of the observations within the same cluster. Here, only 60.6% of the variance is explained by the variation between clusters. This is not enough.

We have noticed that we had high multicolinearity in on regression. Therefore, before to compute them, let’s try to get rid of on of the two variables having at least |0.8| of correlation. ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-268_9069756d2aa3bd19a606bc1d86321b0d’}

Code
correlation_overall_matrix <- cor(Correlation_overall, use = "everything")
high_cor_pairs <- which(abs(correlation_overall_matrix) >= 0.8, arr.ind = TRUE)

# Displaying the results
for (i in 1:nrow(high_cor_pairs)) {
  row <- high_cor_pairs[i, "row"]
  col <- high_cor_pairs[i, "col"]
  
  # Avoiding duplicate pairs and diagonal elements
  if (row < col) {
    cat(sprintf("Variables: %s and %s, Correlation: %f\n", 
                names(Correlation_overall)[row], names(Correlation_overall)[col], correlation_overall_matrix[row, col]))
  }
}
#> Variables: overallscore and goal1, Correlation: 0.897321
#> Variables: overallscore and goal3, Correlation: 0.946398
#> Variables: goal1 and goal3, Correlation: 0.898778
#> Variables: overallscore and goal4, Correlation: 0.878957
#> Variables: goal1 and goal4, Correlation: 0.840789
#> Variables: goal3 and goal4, Correlation: 0.867045
#> Variables: overallscore and goal6, Correlation: 0.892153
#> Variables: goal3 and goal6, Correlation: 0.858130
#> Variables: overallscore and goal7, Correlation: 0.898513
#> Variables: goal1 and goal7, Correlation: 0.869254
#> Variables: goal3 and goal7, Correlation: 0.874643
#> Variables: goal4 and goal7, Correlation: 0.821983
#> Variables: goal6 and goal7, Correlation: 0.823261
#> Variables: overallscore and goal9, Correlation: 0.853988
#> Variables: goal3 and goal9, Correlation: 0.818935
#> Variables: overallscore and goal11, Correlation: 0.886229
#> Variables: goal1 and goal11, Correlation: 0.804748
#> Variables: goal3 and goal11, Correlation: 0.877758
#> Variables: goal4 and goal11, Correlation: 0.838683
#> Variables: goal6 and goal11, Correlation: 0.812701
#> Variables: goal7 and goal11, Correlation: 0.843002
#> Variables: goal9 and goal12, Correlation: -0.840840
#> Variables: goal12 and goal13, Correlation: 0.902347
#> Variables: overallscore and goal16, Correlation: 0.824936
#> Variables: goal12 and goal16, Correlation: -0.827218
#> Variables: goal9 and GDPpercapita, Correlation: 0.815997
#> Variables: goal12 and GDPpercapita, Correlation: -0.843801
#> Variables: goal13 and GDPpercapita, Correlation: -0.811063
#> Variables: overallscore and internet_usage, Correlation: 0.823361
#> Variables: goal9 and internet_usage, Correlation: 0.900486
#> Variables: goal12 and pf_law, Correlation: -0.877854
#> Variables: goal16 and pf_law, Correlation: 0.849666
#> Variables: pf_religion and pf_assembly, Correlation: 0.835825
#> Variables: pf_assembly and pf_expression, Correlation: 0.893390
#> Variables: goal9 and ef_legal, Correlation: 0.829104
#> Variables: goal12 and ef_legal, Correlation: -0.858948
#> Variables: goal16 and ef_legal, Correlation: 0.853261
#> Variables: pf_law and ef_legal, Correlation: 0.871969

# List of high-correlation pairs
correlation_pairs <- list(
  c("overallscore", "goal1"), c("overallscore", "goal3"), c("goal1", "goal3"),
  c("overallscore", "goal4"), c("goal1", "goal4"), c("goal3", "goal4"),
  c("overallscore", "goal6"), c("goal3", "goal6"),
  c("overallscore", "goal7"), c("goal1", "goal7"), c("goal3", "goal7"), c("goal4", "goal7"), c("goal6", "goal7"),
  c("overallscore", "goal9"), c("goal3", "goal9"),
  c("overallscore", "goal11"), c("goal3", "goal11"), c("goal4", "goal11"), c("goal6", "goal11"), c("goal7", "goal11"),
  c("goal9", "goal12"), c("goal12", "goal13"),
  c("overallscore", "goal16"), c("goal12", "goal16"),
  c("goal9", "GDPpercapita"), c("goal12", "GDPpercapita"),
  c("overallscore", "internet_usage"), c("goal9", "internet_usage"),
  c("goal12", "pf_law"), c("goal16", "pf_law"),
  c("pf_religion", "pf_assembly"), c("pf_assembly", "pf_expression"),
  c("goal9", "ef_legal"), c("goal12", "ef_legal"), c("goal16", "ef_legal"), c("pf_law", "ef_legal")
)

# Flatten the list and count the frequency of each variable
flattened_list <- unlist(correlation_pairs)
frequency_count <- table(flattened_list)
variables_to_remove <- c()

for (pair in correlation_pairs) {
  # Select the variable that appears more frequently for removal
  if (frequency_count[pair[1]] > frequency_count[pair[2]]) {
    variables_to_remove <- c(variables_to_remove, pair[1])
  } else if (frequency_count[pair[1]] < frequency_count[pair[2]]) {
    variables_to_remove <- c(variables_to_remove, pair[2])
  } else {
    # If both appear equally, arbitrarily choose one to remove
    variables_to_remove <- c(variables_to_remove, pair[1])
  }
}

variables_to_remove <- unique(variables_to_remove)
variables_to_remove <- sort(variables_to_remove)
print(variables_to_remove) 
#>  [1] "ef_legal"     "goal11"       "goal12"       "goal16"      
#>  [5] "goal3"        "goal4"        "goal7"        "goal9"       
#>  [9] "overallscore" "pf_assembly"

::: Therefore, we will not take into account the variables “ef_legal” “goal11” “goal12” “goal16” “goal3” “goal4” “goal7” “goal9” “overallscore” “pf_assembly” in our regressions, for multicollinearity purpose.

Now, let’s compute the regressions without these variables. ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-269_bfd70c43778b0617eb3e549236485e6a’}

Code
reg_goal1_all_new <- lm(goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal2_all_new <- lm(goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal3_all_new <- lm(goal3 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal4_all_new <- lm(goal4 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal6_all_new <- lm(goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal7_all_new <- lm(goal7 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal8_all_new <- lm(goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal9_all_new <- lm(goal9 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal10_all_new <- lm(goal10 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal11_all_new <- lm(goal11 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal12_all_new <- lm(goal12 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal13_all_new <- lm(goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal15_all_new <- lm(goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal13 + goal15 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal17_all_new <- lm(goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal13 + goal15 + goal16 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)

::: The problem is that even by getting rid of the previous variables, there still might be multicollinearity. Therefore, we need to analyse the vif for each regression and adapt the model in consequence. ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-270_aba7e2a516e400b394a265936ef0babc’}

Code
#for reg1
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal1_all_new, scope=list(lower=nullmod, upper=reg_goal1_all_new), direction="backward") 
#> Start:  AIC=12046
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_security                    1        29 487944 12044
#> - population                     1       137 488051 12045
#> <none>                                       487914 12046
#> - ef_money                       1       757 488671 12047
#> - pf_expression                  1       988 488902 12048
#> - goal2                          1      1112 489026 12049
#> - ef_regulation                  1      1649 489564 12052
#> - pf_law                         1      2447 490361 12055
#> - pf_movement                    1      2823 490737 12057
#> - MilitaryExpenditurePercentGDP  1      3155 491070 12058
#> - pf_identity                    1      3243 491157 12059
#> - ef_trade                       1      5387 493301 12068
#> - goal13                         1      6461 494376 12073
#> - goal5                          1     12725 500639 12101
#> - goal15                         1     17865 505780 12124
#> - internet_usage                 1     18294 506208 12126
#> - GDPpercapita                   1     19951 507865 12133
#> - goal17                         1     21678 509592 12141
#> - pf_religion                    1     24542 512457 12153
#> - goal10                         1     24732 512647 12154
#> - goal8                          1     25162 513076 12156
#> - unemployment.rate              1     39684 527599 12218
#> - ef_government                  1     41879 529793 12227
#> - goal6                          1     95632 583546 12442
#> 
#> Step:  AIC=12044
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - population                     1       137 488080 12043
#> <none>                                       487944 12044
#> - ef_money                       1       752 488696 12046
#> - pf_expression                  1      1000 488944 12047
#> - goal2                          1      1083 489027 12047
#> - ef_regulation                  1      1644 489587 12050
#> - pf_law                         1      2459 490402 12053
#> - pf_movement                    1      2827 490770 12055
#> - MilitaryExpenditurePercentGDP  1      3252 491196 12057
#> - pf_identity                    1      3336 491279 12057
#> - ef_trade                       1      5363 493306 12066
#> - goal13                         1      6476 494419 12071
#> - goal5                          1     12755 500698 12100
#> - goal15                         1     17930 505873 12122
#> - internet_usage                 1     18333 506276 12124
#> - GDPpercapita                   1     19934 507878 12131
#> - goal17                         1     21819 509763 12140
#> - pf_religion                    1     24606 512549 12152
#> - goal8                          1     25158 513102 12154
#> - goal10                         1     26034 513977 12158
#> - unemployment.rate              1     39697 527640 12216
#> - ef_government                  1     43595 531539 12233
#> - goal6                          1     96758 584702 12445
#> 
#> Step:  AIC=12043
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       488080 12043
#> - ef_money                       1       770 488851 12044
#> - goal2                          1      1008 489088 12045
#> - pf_expression                  1      1311 489391 12047
#> - ef_regulation                  1      1687 489768 12048
#> - pf_law                         1      2552 490633 12052
#> - pf_movement                    1      2873 490953 12054
#> - MilitaryExpenditurePercentGDP  1      3148 491228 12055
#> - pf_identity                    1      3200 491280 12055
#> - ef_trade                       1      5647 493728 12066
#> - goal13                         1      6449 494529 12070
#> - goal5                          1     12830 500910 12099
#> - goal15                         1     17919 506000 12121
#> - internet_usage                 1     18293 506373 12123
#> - GDPpercapita                   1     20037 508118 12130
#> - goal17                         1     23851 511932 12147
#> - goal8                          1     25069 513149 12152
#> - pf_religion                    1     27519 515599 12163
#> - goal10                         1     27822 515902 12164
#> - unemployment.rate              1     39585 527665 12214
#> - ef_government                  1     43704 531784 12232
#> - goal6                          1     98070 586151 12448
summary(selmod)
#> 
#> Call:
#> lm(formula = goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + 
#>     goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_movement + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation, data = data_question1)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -42.06  -8.92   0.11   9.74  41.83 
#> 
#> Coefficients:
#>                                Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                   -7.85e+01   7.75e+00  -10.13  < 2e-16
#> goal2                          9.91e-02   4.64e-02    2.13  0.03304
#> goal5                         -2.54e-01   3.33e-02   -7.61  4.0e-14
#> goal6                          9.11e-01   4.33e-02   21.04  < 2e-16
#> goal8                          8.34e-01   7.84e-02   10.64  < 2e-16
#> goal10                         1.82e-01   1.63e-02   11.21  < 2e-16
#> goal13                        -1.89e-01   3.51e-02   -5.40  7.5e-08
#> goal15                        -2.51e-01   2.79e-02   -9.00  < 2e-16
#> goal17                         4.05e-01   3.90e-02   10.38  < 2e-16
#> unemployment.rate              1.05e+02   7.84e+00   13.37  < 2e-16
#> GDPpercapita                  -3.20e-04   3.36e-05   -9.51  < 2e-16
#> MilitaryExpenditurePercentGDP  1.28e+00   3.39e-01    3.77  0.00017
#> internet_usage                 1.90e+01   2.09e+00    9.09  < 2e-16
#> pf_law                         1.71e+00   5.04e-01    3.39  0.00070
#> pf_movement                    1.34e+00   3.73e-01    3.60  0.00032
#> pf_religion                   -3.86e+00   3.47e-01  -11.15  < 2e-16
#> pf_expression                 -8.35e-01   3.43e-01   -2.43  0.01506
#> pf_identity                    7.84e-01   2.06e-01    3.80  0.00015
#> ef_government                  5.24e+00   3.73e-01   14.05  < 2e-16
#> ef_money                      -6.59e-01   3.53e-01   -1.86  0.06233
#> ef_trade                       2.36e+00   4.67e-01    5.05  4.8e-07
#> ef_regulation                 -1.27e+00   4.60e-01   -2.76  0.00582
#>                                  
#> (Intercept)                   ***
#> goal2                         *  
#> goal5                         ***
#> goal6                         ***
#> goal8                         ***
#> goal10                        ***
#> goal13                        ***
#> goal15                        ***
#> goal17                        ***
#> unemployment.rate             ***
#> GDPpercapita                  ***
#> MilitaryExpenditurePercentGDP ***
#> internet_usage                ***
#> pf_law                        ***
#> pf_movement                   ***
#> pf_religion                   ***
#> pf_expression                 *  
#> pf_identity                   ***
#> ef_government                 ***
#> ef_money                      .  
#> ef_trade                      ***
#> ef_regulation                 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 14.9 on 2204 degrees of freedom
#> Multiple R-squared:  0.791,  Adjusted R-squared:  0.789 
#> F-statistic:  398 on 21 and 2204 DF,  p-value: <2e-16
vif(selmod) #pf_law -> get rid of it
#>                         goal2                         goal5 
#>                          2.06                          2.99 
#>                         goal6                         goal8 
#>                          4.53                          4.96 
#>                        goal10                        goal13 
#>                          2.20                          4.51 
#>                        goal15                        goal17 
#>                          1.39                          2.10 
#>             unemployment.rate                  GDPpercapita 
#>                          1.88                          4.35 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.48                          4.22 
#>                        pf_law                   pf_movement 
#>                          6.59                          3.77 
#>                   pf_religion                 pf_expression 
#>                          3.48                          5.01 
#>                   pf_identity                 ef_government 
#>                          2.40                          1.87 
#>                      ef_money                      ef_trade 
#>                          2.74                          4.15 
#>                 ef_regulation 
#>                          2.38
reg_goal1_all_new <- lm(goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                          internet_usage + pf_movement + pf_religion + ef_government + 
                          ef_money + ef_trade + ef_regulation, data = data_question1)
selmod <- step(reg_goal1_all_new, scope=list(lower=nullmod, upper=reg_goal1_all_new), direction="backward") 
#> Start:  AIC=12068
#> goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_movement + pf_religion + ef_government + 
#>     ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       495435 12068
#> - ef_money                       1      1091 496526 12071
#> - ef_regulation                  1      1796 497231 12074
#> - pf_movement                    1      3149 498584 12080
#> - MilitaryExpenditurePercentGDP  1      4152 499587 12085
#> - ef_trade                       1      7945 503380 12101
#> - goal13                         1      8728 504163 12105
#> - goal5                          1     10147 505582 12111
#> - goal15                         1     18676 514111 12148
#> - GDPpercapita                   1     19379 514814 12151
#> - goal17                         1     19412 514847 12152
#> - internet_usage                 1     20313 515748 12155
#> - goal10                         1     29250 524685 12194
#> - goal8                          1     34893 530328 12218
#> - pf_religion                    1     38106 533541 12231
#> - ef_government                  1     41726 537160 12246
#> - unemployment.rate              1     50784 546219 12283
#> - goal6                          1    144694 640129 12636
vif(selmod)
#>                         goal5                         goal6 
#>                          2.78                          3.67 
#>                         goal8                        goal10 
#>                          4.17                          2.09 
#>                        goal13                        goal15 
#>                          4.03                          1.37 
#>                        goal17             unemployment.rate 
#>                          2.00                          1.67 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          4.04                          1.42 
#>                internet_usage                   pf_movement 
#>                          4.13                          3.46 
#>                   pf_religion                 ef_government 
#>                          2.62                          1.72 
#>                      ef_money                      ef_trade 
#>                          2.70                          3.98 
#>                 ef_regulation 
#>                          2.17
reg_goal1_all_new <- lm(goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                          internet_usage + pf_movement + pf_religion + ef_government + 
                          ef_money + ef_trade + ef_regulation, data = data_question1)
#for reg2
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal2_all_new, scope=list(lower=nullmod, upper=reg_goal2_all_new), direction="backward") 
#> Start:  AIC=8460
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - ef_trade                       1         0  97452 8458
#> - pf_expression                  1         9  97460 8459
#> - ef_government                  1        19  97470 8459
#> - internet_usage                 1        34  97486 8459
#> - pf_religion                    1        39  97490 8459
#> - GDPpercapita                   1        81  97533 8460
#> <none>                                        97452 8460
#> - goal10                         1        89  97540 8460
#> - pf_law                         1       110  97562 8461
#> - goal17                         1       220  97672 8463
#> - goal1                          1       222  97674 8463
#> - goal15                         1       277  97729 8465
#> - pf_identity                    1       366  97818 8467
#> - unemployment.rate              1       558  98009 8471
#> - MilitaryExpenditurePercentGDP  1       576  98028 8472
#> - ef_regulation                  1       599  98051 8472
#> - ef_money                       1       758  98210 8476
#> - pf_movement                    1       800  98252 8477
#> - population                     1      1370  98822 8489
#> - goal5                          1      2755 100207 8520
#> - pf_security                    1      3514 100965 8537
#> - goal13                         1      3918 101370 8546
#> - goal8                          1      4831 102282 8566
#> - goal6                          1      7753 105204 8629
#> 
#> Step:  AIC=8458
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_expression                  1         8  97460 8457
#> - ef_government                  1        19  97471 8457
#> - internet_usage                 1        34  97486 8457
#> - pf_religion                    1        40  97492 8457
#> - GDPpercapita                   1        81  97533 8458
#> <none>                                        97452 8458
#> - goal10                         1        88  97540 8458
#> - pf_law                         1       110  97562 8459
#> - goal17                         1       223  97675 8461
#> - goal1                          1       227  97679 8462
#> - goal15                         1       277  97729 8463
#> - pf_identity                    1       370  97822 8465
#> - unemployment.rate              1       561  98013 8469
#> - MilitaryExpenditurePercentGDP  1       576  98028 8470
#> - ef_regulation                  1       619  98071 8470
#> - pf_movement                    1       809  98261 8475
#> - ef_money                       1      1047  98499 8480
#> - population                     1      1383  98835 8488
#> - goal5                          1      2755 100207 8518
#> - pf_security                    1      3524 100976 8535
#> - goal13                         1      3935 101387 8545
#> - goal8                          1      4994 102446 8568
#> - goal6                          1      7809 105261 8628
#> 
#> Step:  AIC=8457
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_identity + ef_government + ef_money + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - ef_government                  1        18  97478 8455
#> - internet_usage                 1        39  97500 8455
#> - pf_religion                    1        73  97534 8456
#> - GDPpercapita                   1        79  97539 8456
#> - goal10                         1        83  97544 8456
#> <none>                                        97460 8457
#> - pf_law                         1       130  97591 8458
#> - goal1                          1       230  97691 8460
#> - goal17                         1       243  97703 8460
#> - goal15                         1       281  97742 8461
#> - pf_identity                    1       362  97823 8463
#> - unemployment.rate              1       553  98013 8467
#> - MilitaryExpenditurePercentGDP  1       568  98029 8468
#> - ef_regulation                  1       613  98074 8469
#> - pf_movement                    1       904  98364 8475
#> - ef_money                       1      1038  98499 8478
#> - population                     1      1428  98888 8487
#> - goal5                          1      2775 100235 8517
#> - pf_security                    1      3516 100977 8533
#> - goal13                         1      3963 101423 8543
#> - goal8                          1      5090 102550 8568
#> - goal6                          1      7809 105270 8626
#> 
#> Step:  AIC=8455
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_identity + ef_money + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - internet_usage                 1        39  97518 8454
#> - pf_religion                    1        59  97538 8454
#> - goal10                         1        74  97553 8455
#> - GDPpercapita                   1        77  97555 8455
#> <none>                                        97478 8455
#> - pf_law                         1       161  97640 8457
#> - goal17                         1       284  97763 8459
#> - goal1                          1       294  97773 8460
#> - goal15                         1       304  97783 8460
#> - pf_identity                    1       369  97847 8461
#> - unemployment.rate              1       535  98013 8465
#> - MilitaryExpenditurePercentGDP  1       581  98060 8466
#> - ef_regulation                  1       598  98076 8467
#> - pf_movement                    1       890  98368 8473
#> - ef_money                       1      1109  98587 8478
#> - population                     1      1426  98904 8485
#> - goal5                          1      2808 100287 8516
#> - pf_security                    1      3528 101006 8532
#> - goal13                         1      3976 101454 8542
#> - goal8                          1      5078 102556 8566
#> - goal6                          1      7792 105270 8624
#> 
#> Step:  AIC=8454
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_identity + 
#>     ef_money + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_religion                    1        63  97580 8453
#> - goal10                         1        81  97599 8454
#> <none>                                        97518 8454
#> - GDPpercapita                   1       144  97662 8455
#> - pf_law                         1       174  97692 8456
#> - goal17                         1       271  97788 8458
#> - goal15                         1       272  97789 8458
#> - goal1                          1       352  97869 8460
#> - pf_identity                    1       373  97891 8460
#> - unemployment.rate              1       539  98056 8464
#> - ef_regulation                  1       560  98078 8465
#> - MilitaryExpenditurePercentGDP  1       582  98100 8465
#> - pf_movement                    1       972  98490 8474
#> - ef_money                       1      1162  98680 8478
#> - population                     1      1424  98942 8484
#> - goal5                          1      3114 100632 8522
#> - pf_security                    1      3636 101154 8533
#> - goal13                         1      4176 101694 8545
#> - goal8                          1      5346 102864 8571
#> - goal6                          1      8090 105607 8629
#> 
#> Step:  AIC=8453
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     pf_law + pf_security + pf_movement + pf_identity + ef_money + 
#>     ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> <none>                                        97580 8453
#> - goal10                         1       111  97691 8454
#> - GDPpercapita                   1       139  97719 8454
#> - pf_law                         1       231  97812 8457
#> - goal17                         1       252  97833 8457
#> - goal15                         1       277  97857 8458
#> - pf_identity                    1       426  98006 8461
#> - goal1                          1       446  98026 8461
#> - unemployment.rate              1       554  98134 8464
#> - MilitaryExpenditurePercentGDP  1       567  98147 8464
#> - ef_regulation                  1       568  98148 8464
#> - ef_money                       1      1195  98775 8478
#> - pf_movement                    1      1753  99333 8491
#> - population                     1      1819  99399 8492
#> - goal5                          1      3199 100779 8523
#> - pf_security                    1      3876 101456 8538
#> - goal13                         1      4132 101712 8544
#> - goal8                          1      5287 102867 8569
#> - goal6                          1      8031 105611 8627
vif(selmod) 
#>                         goal1                         goal5 
#>                          3.94                          2.72 
#>                         goal6                         goal8 
#>                          4.93                          4.49 
#>                        goal10                        goal13 
#>                          2.39                          4.26 
#>                        goal15                        goal17 
#>                          1.40                          2.12 
#>             unemployment.rate                  GDPpercapita 
#>                          1.93                          3.89 
#> MilitaryExpenditurePercentGDP                        pf_law 
#>                          1.47                          5.95 
#>                   pf_security                   pf_movement 
#>                          2.01                          2.52 
#>                   pf_identity                      ef_money 
#>                          2.38                          1.94 
#>                 ef_regulation                    population 
#>                          2.06                          1.24
reg_goal2_all_new <- lm(goal2 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + MilitaryExpenditurePercentGDP + internet_usage + 
                          pf_law + pf_security + pf_movement + pf_identity + ef_money + 
                          ef_trade + ef_regulation + population, data = data_question1)
#reg5
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal5_all_new, scope=list(lower=nullmod, upper=reg_goal5_all_new), direction="backward") 
#> Start:  AIC=9964
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_law                         1        32 191527  9962
#> - pf_religion                    1        62 191557  9963
#> - population                     1        97 191592  9963
#> - ef_trade                       1       122 191617  9963
#> <none>                                       191495  9964
#> - goal15                         1       205 191700  9964
#> - pf_movement                    1       261 191756  9965
#> - unemployment.rate              1       279 191774  9965
#> - pf_expression                  1       332 191827  9966
#> - ef_money                       1       749 192244  9971
#> - GDPpercapita                   1       985 192479  9973
#> - goal8                          1      1085 192580  9975
#> - pf_security                    1      2586 194081  9992
#> - goal13                         1      3266 194761 10000
#> - MilitaryExpenditurePercentGDP  1      3919 195414 10007
#> - goal6                          1      4466 195961 10013
#> - goal1                          1      4994 196489 10019
#> - goal2                          1      5414 196909 10024
#> - ef_regulation                  1      6414 197909 10035
#> - goal17                         1      7562 199057 10048
#> - goal10                         1      7799 199293 10051
#> - internet_usage                 1      7890 199385 10052
#> - pf_identity                    1      8103 199598 10054
#> - ef_government                  1      8550 200045 10059
#> 
#> Step:  AIC=9962
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_religion                    1        74 191601  9961
#> - population                     1        90 191617  9961
#> - ef_trade                       1       110 191637  9962
#> <none>                                       191527  9962
#> - goal15                         1       203 191730  9963
#> - unemployment.rate              1       250 191777  9963
#> - pf_movement                    1       261 191788  9963
#> - pf_expression                  1       303 191830  9964
#> - ef_money                       1       731 192258  9969
#> - GDPpercapita                   1       959 192486  9972
#> - goal8                          1      1169 192696  9974
#> - pf_security                    1      2596 194123  9990
#> - goal13                         1      3551 195078 10001
#> - MilitaryExpenditurePercentGDP  1      3898 195425 10005
#> - goal6                          1      4510 196037 10012
#> - goal1                          1      4963 196490 10017
#> - goal2                          1      5393 196920 10022
#> - ef_regulation                  1      7069 198596 10041
#> - goal17                         1      7634 199161 10047
#> - goal10                         1      7773 199300 10049
#> - internet_usage                 1      7872 199398 10050
#> - pf_identity                    1      8112 199639 10053
#> - ef_government                  1      9374 200901 10067
#> 
#> Step:  AIC=9961
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_security + pf_movement + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - population                     1        43 191645  9960
#> - ef_trade                       1       123 191724  9961
#> <none>                                       191601  9961
#> - goal15                         1       212 191813  9962
#> - pf_expression                  1       229 191830  9962
#> - unemployment.rate              1       249 191850  9962
#> - pf_movement                    1       393 191994  9964
#> - ef_money                       1       703 192305  9967
#> - GDPpercapita                   1       934 192535  9970
#> - goal8                          1      1232 192833  9974
#> - pf_security                    1      2672 194274  9990
#> - goal13                         1      3508 195110 10000
#> - MilitaryExpenditurePercentGDP  1      3845 195447 10004
#> - goal6                          1      4593 196194 10012
#> - goal2                          1      5369 196970 10021
#> - goal1                          1      5493 197094 10022
#> - ef_regulation                  1      7138 198740 10041
#> - goal17                         1      7561 199162 10045
#> - internet_usage                 1      7919 199521 10049
#> - goal10                         1      8295 199896 10054
#> - pf_identity                    1      8756 200357 10059
#> - ef_government                  1      9409 201010 10066
#> 
#> Step:  AIC=9960
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_security + pf_movement + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_trade                       1       141 191785  9959
#> <none>                                       191645  9960
#> - goal15                         1       183 191828  9960
#> - pf_expression                  1       215 191860  9960
#> - unemployment.rate              1       244 191889  9961
#> - pf_movement                    1       359 192004  9962
#> - ef_money                       1       729 192374  9966
#> - GDPpercapita                   1       941 192586  9969
#> - goal8                          1      1279 192924  9973
#> - pf_security                    1      2663 194308  9989
#> - goal13                         1      3534 195179  9998
#> - MilitaryExpenditurePercentGDP  1      3805 195450 10002
#> - goal6                          1      4549 196194 10010
#> - goal1                          1      5452 197096 10020
#> - goal2                          1      5632 197277 10022
#> - ef_regulation                  1      7158 198803 10039
#> - goal17                         1      7606 199251 10044
#> - internet_usage                 1      7936 199580 10048
#> - goal10                         1      8612 200256 10056
#> - pf_identity                    1      9078 200723 10061
#> - ef_government                  1      9599 201243 10067
#> 
#> Step:  AIC=9959
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_security + pf_movement + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal15                         1       164 191950  9959
#> <none>                                       191785  9959
#> - pf_expression                  1       244 192029  9960
#> - unemployment.rate              1       276 192062  9961
#> - pf_movement                    1       305 192090  9961
#> - ef_money                       1       593 192378  9964
#> - GDPpercapita                   1       941 192727  9968
#> - goal8                          1      1171 192957  9971
#> - pf_security                    1      2778 194563  9989
#> - goal13                         1      3590 195375  9999
#> - MilitaryExpenditurePercentGDP  1      3878 195663 10002
#> - goal6                          1      4446 196232 10008
#> - goal2                          1      5671 197457 10022
#> - goal1                          1      5777 197562 10023
#> - ef_regulation                  1      7093 198878 10038
#> - goal17                         1      7825 199610 10046
#> - internet_usage                 1      8387 200172 10053
#> - goal10                         1      8492 200277 10054
#> - pf_identity                    1      8938 200723 10059
#> - ef_government                  1      9784 201569 10068
#> 
#> Step:  AIC=9959
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal17 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_security + pf_movement + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       191950  9959
#> - unemployment.rate              1       196 192145  9960
#> - pf_expression                  1       223 192173  9960
#> - pf_movement                    1       356 192305  9961
#> - ef_money                       1       588 192537  9964
#> - GDPpercapita                   1      1022 192972  9969
#> - goal8                          1      1192 193142  9971
#> - pf_security                    1      2997 194947  9992
#> - goal13                         1      3613 195562  9999
#> - MilitaryExpenditurePercentGDP  1      3903 195852 10002
#> - goal6                          1      4423 196373 10008
#> - goal2                          1      5552 197501 10021
#> - goal1                          1      6541 198490 10032
#> - ef_regulation                  1      7038 198988 10038
#> - goal17                         1      7700 199649 10045
#> - goal10                         1      8600 200550 10055
#> - pf_identity                    1      9229 201179 10062
#> - internet_usage                 1      9430 201379 10064
#> - ef_government                  1     10238 202187 10073
vif(selmod) #goal6
#>                         goal1                         goal2 
#>                          4.12                          2.05 
#>                         goal6                         goal8 
#>                          5.26                          4.91 
#>                        goal10                        goal13 
#>                          2.12                          4.28 
#>                        goal17             unemployment.rate 
#>                          2.04                          1.80 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          4.18                          1.44 
#>                internet_usage                   pf_security 
#>                          3.95                          2.01 
#>                   pf_movement                 pf_expression 
#>                          3.36                          3.81 
#>                   pf_identity                 ef_government 
#>                          2.24                          1.77 
#>                      ef_money                 ef_regulation 
#>                          2.02                          2.05
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                          internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
                          pf_expression + pf_identity + ef_government + ef_money + 
                          ef_trade + ef_regulation, data = data_question1)
selmod <- step(reg_goal5_all_new, scope=list(lower=nullmod, upper=reg_goal5_all_new), direction="backward") 
#> Start:  AIC=10018
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_trade                       1         9 196753 10016
#> - pf_expression                  1        70 196814 10017
#> - pf_religion                    1       109 196853 10017
#> - goal15                         1       124 196868 10018
#> - pf_law                         1       152 196896 10018
#> <none>                                       196744 10018
#> - pf_movement                    1       305 197049 10020
#> - ef_money                       1       832 197576 10026
#> - GDPpercapita                   1       914 197658 10027
#> - unemployment.rate              1      1368 198112 10032
#> - goal1                          1      1975 198719 10038
#> - pf_security                    1      3322 200067 10054
#> - MilitaryExpenditurePercentGDP  1      4132 200876 10063
#> - goal13                         1      4481 201225 10066
#> - ef_regulation                  1      6185 202929 10085
#> - goal10                         1      8132 204876 10106
#> - goal17                         1      8783 205527 10113
#> - ef_government                  1     10609 207353 10133
#> - goal2                          1     11903 208647 10147
#> - internet_usage                 1     11998 208742 10148
#> - pf_identity                    1     14494 211238 10174
#> 
#> Step:  AIC=10016
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_expression                  1        74 196828 10015
#> - pf_religion                    1       110 196864 10016
#> - goal15                         1       120 196874 10016
#> - pf_law                         1       144 196898 10016
#> <none>                                       196753 10016
#> - pf_movement                    1       296 197049 10018
#> - GDPpercapita                   1       908 197662 10025
#> - ef_money                       1      1007 197761 10026
#> - unemployment.rate              1      1362 198116 10030
#> - goal1                          1      2102 198856 10038
#> - pf_security                    1      3342 200095 10052
#> - MilitaryExpenditurePercentGDP  1      4148 200901 10061
#> - goal13                         1      4499 201253 10065
#> - ef_regulation                  1      6340 203094 10085
#> - goal10                         1      8143 204897 10105
#> - goal17                         1      8841 205595 10112
#> - ef_government                  1     10700 207453 10132
#> - goal2                          1     11897 208650 10145
#> - internet_usage                 1     12175 208928 10148
#> - pf_identity                    1     14766 211519 10175
#> 
#> Step:  AIC=10015
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_identity + 
#>     ef_government + ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_religion                    1        58 196886 10014
#> - pf_law                         1       103 196930 10014
#> - goal15                         1       120 196948 10015
#> <none>                                       196828 10015
#> - pf_movement                    1       240 197068 10016
#> - GDPpercapita                   1       931 197759 10024
#> - ef_money                       1       967 197795 10024
#> - unemployment.rate              1      1362 198189 10029
#> - goal1                          1      2119 198947 10037
#> - pf_security                    1      3390 200218 10051
#> - MilitaryExpenditurePercentGDP  1      4074 200901 10059
#> - goal13                         1      4433 201260 10063
#> - ef_regulation                  1      6448 203276 10085
#> - goal10                         1      8337 205165 10106
#> - goal17                         1      8767 205595 10110
#> - ef_government                  1     10770 207598 10132
#> - goal2                          1     11823 208651 10143
#> - internet_usage                 1     12525 209353 10151
#> - pf_identity                    1     14935 211763 10176
#> 
#> Step:  AIC=10014
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_identity + ef_government + 
#>     ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal15                         1       142 197028 10013
#> - pf_law                         1       175 197060 10014
#> <none>                                       196886 10014
#> - pf_movement                    1       520 197406 10018
#> - GDPpercapita                   1       917 197803 10022
#> - ef_money                       1       939 197824 10022
#> - unemployment.rate              1      1406 198292 10028
#> - goal1                          1      2439 199325 10039
#> - pf_security                    1      3530 200415 10051
#> - MilitaryExpenditurePercentGDP  1      4144 201029 10058
#> - goal13                         1      4393 201279 10061
#> - ef_regulation                  1      6404 203290 10083
#> - goal10                         1      8517 205403 10106
#> - goal17                         1      8870 205756 10110
#> - ef_government                  1     11195 208081 10135
#> - goal2                          1     11803 208689 10141
#> - internet_usage                 1     12526 209412 10149
#> - pf_identity                    1     15574 212460 10181
#> 
#> Step:  AIC=10013
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_identity + ef_government + 
#>     ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       197028 10013
#> - pf_law                         1       192 197220 10014
#> - pf_movement                    1       610 197638 10018
#> - ef_money                       1       935 197963 10022
#> - GDPpercapita                   1       996 198024 10023
#> - unemployment.rate              1      1272 198300 10026
#> - goal1                          1      2948 199976 10045
#> - pf_security                    1      3775 200804 10054
#> - MilitaryExpenditurePercentGDP  1      4183 201211 10058
#> - goal13                         1      4416 201444 10061
#> - ef_regulation                  1      6341 203369 10082
#> - goal10                         1      8686 205714 10108
#> - goal17                         1      8761 205789 10108
#> - ef_government                  1     11606 208635 10139
#> - goal2                          1     11669 208697 10140
#> - internet_usage                 1     13930 210958 10164
#> - pf_identity                    1     15976 213004 10185
vif(selmod) #pf_law
#>                         goal1                         goal2 
#>                          3.22                          1.76 
#>                        goal10                        goal13 
#>                          2.11                          4.32 
#>                        goal17             unemployment.rate 
#>                          2.06                          1.40 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          4.44                          1.44 
#>                internet_usage                        pf_law 
#>                          3.67                          5.62 
#>                   pf_security                   pf_movement 
#>                          2.09                          2.58 
#>                   pf_identity                 ef_government 
#>                          1.99                          1.78 
#>                      ef_money                 ef_regulation 
#>                          2.00                          2.20
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
                          GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
                          pf_security + pf_movement + pf_religion + pf_identity + 
                          ef_government + ef_money + ef_trade + ef_regulation, data = data_question1)
selmod <- step(reg_goal5_all_new, scope=list(lower=nullmod, upper=reg_goal5_all_new), direction="backward") 
#> Start:  AIC=10016
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_identity + ef_government + 
#>     ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_trade                       1         4 196930 10014
#> - goal15                         1       125 197052 10016
#> - pf_religion                    1       133 197059 10016
#> <none>                                       196927 10016
#> - pf_movement                    1       280 197207 10017
#> - ef_money                       1       780 197707 10023
#> - GDPpercapita                   1       830 197757 10024
#> - unemployment.rate              1      1267 198193 10029
#> - goal1                          1      1887 198813 10036
#> - pf_security                    1      3269 200195 10051
#> - MilitaryExpenditurePercentGDP  1      3961 200888 10059
#> - goal13                         1      5166 202093 10072
#> - ef_regulation                  1      7095 204021 10093
#> - goal10                         1      8241 205168 10106
#> - goal17                         1      8615 205542 10110
#> - goal2                          1     11904 208831 10145
#> - ef_government                  1     12198 209124 10148
#> - internet_usage                 1     12272 209198 10149
#> - pf_identity                    1     14710 211637 10175
#> 
#> Step:  AIC=10014
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_identity + ef_government + 
#>     ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal15                         1       123 197053 10014
#> - pf_religion                    1       130 197060 10014
#> <none>                                       196930 10014
#> - pf_movement                    1       278 197208 10016
#> - GDPpercapita                   1       832 197763 10022
#> - ef_money                       1       985 197915 10023
#> - unemployment.rate              1      1267 198197 10027
#> - goal1                          1      2021 198952 10035
#> - pf_security                    1      3317 200247 10050
#> - MilitaryExpenditurePercentGDP  1      3978 200908 10057
#> - goal13                         1      5163 202093 10070
#> - ef_regulation                  1      7458 204389 10095
#> - goal10                         1      8253 205183 10104
#> - goal17                         1      8691 205621 10108
#> - goal2                          1     11917 208848 10143
#> - ef_government                  1     12215 209145 10146
#> - internet_usage                 1     12466 209397 10149
#> - pf_identity                    1     15022 211952 10176
#> 
#> Step:  AIC=10014
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_identity + ef_government + 
#>     ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_religion                    1       167 197220 10014
#> <none>                                       197053 10014
#> - pf_movement                    1       301 197355 10015
#> - GDPpercapita                   1       905 197959 10022
#> - ef_money                       1       987 198041 10023
#> - unemployment.rate              1      1147 198201 10025
#> - goal1                          1      2363 199417 10038
#> - pf_security                    1      3511 200565 10051
#> - MilitaryExpenditurePercentGDP  1      4001 201055 10057
#> - goal13                         1      5203 202257 10070
#> - ef_regulation                  1      7407 204461 10094
#> - goal10                         1      8486 205540 10106
#> - goal17                         1      8585 205639 10107
#> - goal2                          1     11799 208853 10141
#> - ef_government                  1     12835 209888 10152
#> - internet_usage                 1     13800 210853 10162
#> - pf_identity                    1     15297 212350 10178
#> 
#> Step:  AIC=10014
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_identity + ef_government + 
#>     ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       197220 10014
#> - GDPpercapita                   1       828 198048 10021
#> - ef_money                       1       939 198159 10022
#> - pf_movement                    1       988 198208 10023
#> - unemployment.rate              1      1123 198343 10024
#> - goal1                          1      2835 200055 10043
#> - pf_security                    1      3584 200804 10052
#> - MilitaryExpenditurePercentGDP  1      4042 201262 10057
#> - goal13                         1      5288 202508 10071
#> - ef_regulation                  1      7504 204724 10095
#> - goal10                         1      8565 205785 10106
#> - goal17                         1      8570 205790 10106
#> - goal2                          1     11766 208986 10141
#> - ef_government                  1     12758 209978 10151
#> - internet_usage                 1     13867 211088 10163
#> - pf_identity                    1     16411 213631 10190
vif(selmod) 
#>                         goal1                         goal2 
#>                          3.20                          1.76 
#>                        goal10                        goal13 
#>                          2.10                          4.01 
#>                        goal17             unemployment.rate 
#>                          2.01                          1.34 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          4.06                          1.42 
#>                internet_usage                   pf_security 
#>                          3.67                          1.97 
#>                   pf_movement                   pf_identity 
#>                          2.27                          1.98 
#>                 ef_government                      ef_money 
#>                          1.70                          2.00 
#>                 ef_regulation 
#>                          2.04
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
                          GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
                          pf_security + pf_movement + pf_religion + pf_identity + ef_government + 
                          ef_money + ef_regulation, data = data_question1)
#reg6
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal6_all_new, scope=list(lower=nullmod, upper=reg_goal6_all_new), direction="backward") 
#> Start:  AIC=8457
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_movement                    1         0  97288 8455
#> - unemployment.rate              1        15  97303 8455
#> - MilitaryExpenditurePercentGDP  1        39  97327 8456
#> - ef_money                       1        49  97337 8456
#> <none>                                        97288 8457
#> - goal10                         1       158  97446 8458
#> - ef_regulation                  1       162  97450 8458
#> - pf_law                         1       179  97468 8459
#> - goal15                         1       196  97484 8459
#> - goal13                         1       226  97514 8460
#> - pf_religion                    1       228  97516 8460
#> - ef_government                  1       261  97549 8461
#> - GDPpercapita                   1       274  97562 8461
#> - goal17                         1       288  97576 8461
#> - pf_expression                  1       349  97637 8463
#> - population                     1       476  97764 8466
#> - ef_trade                       1       538  97826 8467
#> - pf_security                    1       642  97930 8469
#> - goal8                          1      1134  98422 8480
#> - internet_usage                 1      1937  99225 8499
#> - goal5                          1      2269  99557 8506
#> - pf_identity                    1      7190 104478 8613
#> - goal2                          1      7740 105028 8625
#> - goal1                          1     19069 116357 8853
#> 
#> Step:  AIC=8455
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - unemployment.rate              1        15  97303 8453
#> - MilitaryExpenditurePercentGDP  1        41  97329 8454
#> - ef_money                       1        49  97337 8454
#> <none>                                        97288 8455
#> - goal10                         1       158  97447 8456
#> - ef_regulation                  1       163  97451 8456
#> - pf_law                         1       179  97468 8457
#> - goal15                         1       196  97484 8457
#> - goal13                         1       226  97514 8458
#> - pf_religion                    1       254  97542 8458
#> - ef_government                  1       261  97549 8459
#> - GDPpercapita                   1       274  97562 8459
#> - goal17                         1       295  97584 8459
#> - pf_expression                  1       371  97659 8461
#> - population                     1       477  97765 8464
#> - ef_trade                       1       549  97837 8465
#> - pf_security                    1       669  97957 8468
#> - goal8                          1      1135  98423 8478
#> - internet_usage                 1      1952  99241 8497
#> - goal5                          1      2273  99561 8504
#> - pf_identity                    1      7327 104615 8614
#> - goal2                          1      7804 105092 8624
#> - goal1                          1     19207 116495 8854
#> 
#> Step:  AIC=8453
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - MilitaryExpenditurePercentGDP  1        45  97349 8452
#> - ef_money                       1        51  97354 8452
#> <none>                                        97303 8453
#> - pf_law                         1       165  97468 8455
#> - ef_regulation                  1       170  97473 8455
#> - goal10                         1       185  97489 8455
#> - goal13                         1       230  97533 8456
#> - goal15                         1       233  97537 8456
#> - ef_government                  1       247  97550 8457
#> - pf_religion                    1       257  97560 8457
#> - goal17                         1       282  97586 8457
#> - GDPpercapita                   1       302  97605 8458
#> - pf_expression                  1       360  97663 8459
#> - population                     1       484  97787 8462
#> - ef_trade                       1       541  97844 8463
#> - pf_security                    1       670  97973 8466
#> - goal8                          1      1355  98658 8482
#> - internet_usage                 1      1945  99248 8495
#> - goal5                          1      2291  99595 8503
#> - pf_identity                    1      7327 104630 8613
#> - goal2                          1      7795 105099 8623
#> - goal1                          1     20727 118031 8881
#> 
#> Step:  AIC=8452
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + GDPpercapita + internet_usage + pf_law + pf_security + 
#>     pf_religion + pf_expression + pf_identity + ef_government + 
#>     ef_money + ef_trade + ef_regulation + population
#> 
#>                  Df Sum of Sq    RSS  AIC
#> - ef_money        1        51  97400 8451
#> <none>                         97349 8452
#> - pf_law          1       141  97490 8453
#> - ef_regulation   1       166  97514 8454
#> - goal10          1       178  97526 8454
#> - goal13          1       214  97563 8455
#> - ef_government   1       233  97581 8455
#> - goal15          1       236  97585 8455
#> - goal17          1       241  97589 8456
#> - pf_religion     1       260  97608 8456
#> - GDPpercapita    1       339  97688 8458
#> - pf_expression   1       433  97782 8460
#> - population      1       527  97876 8462
#> - ef_trade        1       536  97885 8462
#> - pf_security     1       633  97982 8464
#> - goal8           1      1348  98696 8481
#> - internet_usage  1      1946  99295 8494
#> - goal5           1      2459  99807 8506
#> - pf_identity     1      7453 104801 8614
#> - goal2           1      7904 105253 8624
#> - goal1           1     20724 118073 8880
#> 
#> Step:  AIC=8451
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + GDPpercapita + internet_usage + pf_law + pf_security + 
#>     pf_religion + pf_expression + pf_identity + ef_government + 
#>     ef_trade + ef_regulation + population
#> 
#>                  Df Sum of Sq    RSS  AIC
#> <none>                         97400 8451
#> - pf_law          1       129  97528 8452
#> - ef_regulation   1       154  97554 8453
#> - goal10          1       188  97587 8453
#> - ef_government   1       214  97614 8454
#> - goal13          1       217  97617 8454
#> - goal15          1       244  97644 8455
#> - pf_religion     1       247  97647 8455
#> - goal17          1       254  97654 8455
#> - GDPpercapita    1       354  97754 8457
#> - pf_expression   1       443  97843 8459
#> - population      1       520  97920 8461
#> - pf_security     1       640  98039 8464
#> - ef_trade        1       975  98375 8471
#> - goal8           1      1360  98760 8480
#> - internet_usage  1      2097  99497 8497
#> - goal5           1      2521  99920 8506
#> - pf_identity     1      7411 104811 8612
#> - goal2           1      8100 105500 8627
#> - goal1           1     20673 118073 8878
vif(selmod) 
#>          goal1          goal2          goal5          goal8 
#>           3.59           1.96           2.94           3.88 
#>         goal10         goal13         goal15         goal17 
#>           2.46           4.50           1.42           2.02 
#>   GDPpercapita internet_usage         pf_law    pf_security 
#>           4.33           4.19           6.40           2.06 
#>    pf_religion  pf_expression    pf_identity  ef_government 
#>           3.94           4.86           2.29           1.98 
#>       ef_trade  ef_regulation     population 
#>           3.06           2.35           1.50
reg_goal6_all_new <- lm(goal6 ~ goal1 + goal2 + goal5 + goal8 + unemployment.rate + GDPpercapita + 
                          MilitaryExpenditurePercentGDP + internet_usage + pf_security + 
                          pf_movement + pf_religion + pf_expression + pf_identity + 
                          ef_government + ef_money + population, data = data_question1)

#reg8
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal8_all_new, scope=list(lower=nullmod, upper=reg_goal8_all_new), direction="backward") 
#> Start:  AIC=6084
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - pf_security                    1         0 33474 6082
#> - goal17                         1         3 33477 6082
#> - goal15                         1         3 33478 6082
#> <none>                                       33474 6084
#> - MilitaryExpenditurePercentGDP  1        32 33506 6084
#> - ef_money                       1        56 33530 6085
#> - goal10                         1        60 33534 6086
#> - pf_identity                    1        68 33542 6086
#> - pf_movement                    1        68 33542 6086
#> - GDPpercapita                   1       116 33590 6089
#> - pf_religion                    1       132 33606 6090
#> - ef_regulation                  1       182 33657 6094
#> - goal6                          1       231 33706 6097
#> - goal5                          1       285 33759 6101
#> - goal7                          1       303 33777 6102
#> - ef_government                  1       355 33829 6105
#> - population                     1       487 33961 6114
#> - goal13                         1       539 34014 6117
#> - pf_expression                  1       871 34345 6139
#> - pf_law                         1       927 34401 6143
#> - ef_trade                       1      1095 34569 6153
#> - internet_usage                 1      1377 34851 6171
#> - goal2                          1      1779 35253 6197
#> - goal1                          1      2000 35475 6211
#> - unemployment.rate              1     10749 44224 6702
#> 
#> Step:  AIC=6082
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal17                         1         3 33477 6080
#> - goal15                         1         3 33478 6080
#> <none>                                       33474 6082
#> - MilitaryExpenditurePercentGDP  1        32 33506 6082
#> - ef_money                       1        56 33530 6083
#> - goal10                         1        65 33540 6084
#> - pf_identity                    1        68 33542 6084
#> - pf_movement                    1        72 33546 6085
#> - GDPpercapita                   1       116 33591 6087
#> - pf_religion                    1       133 33607 6089
#> - ef_regulation                  1       182 33657 6092
#> - goal6                          1       233 33707 6095
#> - goal5                          1       288 33762 6099
#> - goal7                          1       303 33777 6100
#> - ef_government                  1       366 33840 6104
#> - population                     1       487 33961 6112
#> - goal13                         1       539 34014 6115
#> - pf_expression                  1       872 34346 6137
#> - pf_law                         1       987 34462 6144
#> - ef_trade                       1      1098 34573 6152
#> - internet_usage                 1      1394 34868 6171
#> - goal2                          1      1852 35326 6200
#> - goal1                          1      2001 35475 6209
#> - unemployment.rate              1     10749 44224 6700
#> 
#> Step:  AIC=6080
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal15                         1         3 33480 6078
#> <none>                                       33477 6080
#> - MilitaryExpenditurePercentGDP  1        42 33519 6081
#> - ef_money                       1        54 33531 6082
#> - goal10                         1        63 33540 6082
#> - pf_identity                    1        66 33543 6082
#> - pf_movement                    1        77 33554 6083
#> - GDPpercapita                   1       114 33591 6085
#> - pf_religion                    1       131 33608 6087
#> - ef_regulation                  1       181 33658 6090
#> - goal6                          1       231 33709 6093
#> - goal7                          1       300 33777 6098
#> - goal5                          1       308 33785 6098
#> - ef_government                  1       398 33875 6104
#> - population                     1       495 33973 6111
#> - goal13                         1       549 34026 6114
#> - pf_expression                  1       913 34390 6138
#> - pf_law                         1      1011 34488 6144
#> - ef_trade                       1      1097 34574 6150
#> - internet_usage                 1      1406 34883 6170
#> - goal2                          1      1851 35328 6198
#> - goal1                          1      2058 35535 6211
#> - unemployment.rate              1     10921 44398 6706
#> 
#> Step:  AIC=6078
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> <none>                                       33480 6078
#> - MilitaryExpenditurePercentGDP  1        41 33521 6079
#> - ef_money                       1        55 33535 6080
#> - pf_identity                    1        70 33550 6081
#> - goal10                         1        76 33556 6081
#> - pf_movement                    1        77 33557 6081
#> - GDPpercapita                   1       119 33599 6084
#> - pf_religion                    1       133 33613 6085
#> - ef_regulation                  1       186 33666 6088
#> - goal6                          1       232 33712 6091
#> - goal5                          1       312 33792 6097
#> - goal7                          1       315 33795 6097
#> - ef_government                  1       406 33886 6103
#> - population                     1       495 33975 6109
#> - goal13                         1       549 34029 6112
#> - pf_expression                  1       915 34395 6136
#> - pf_law                         1      1011 34491 6142
#> - ef_trade                       1      1110 34590 6149
#> - internet_usage                 1      1521 35001 6175
#> - goal2                          1      1848 35328 6196
#> - goal1                          1      2064 35544 6209
#> - unemployment.rate              1     11598 45078 6738
vif(selmod) #goal6
#>                         goal1                         goal2 
#>                          6.03                          1.99 
#>                         goal5                         goal6 
#>                          3.05                          5.72 
#>                         goal7                        goal10 
#>                          6.48                          2.14 
#>                        goal13             unemployment.rate 
#>                          4.57                          1.42 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          4.41                          1.36 
#>                internet_usage                        pf_law 
#>                          4.13                          6.39 
#>                   pf_movement                   pf_religion 
#>                          3.73                          4.29 
#>                 pf_expression                   pf_identity 
#>                          5.16                          2.46 
#>                 ef_government                      ef_money 
#>                          1.94                          2.74 
#>                      ef_trade                 ef_regulation 
#>                          4.15                          2.51 
#>                    population 
#>                          1.41
reg_goal8_all_new <- lm(goal8 ~ goal1 + goal2 + goal13 + goal15 + unemployment.rate + 
                          internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
                          pf_expression + pf_identity + ef_government + ef_trade + 
                          population, data = data_question1)

#reg10
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal10_all_new, scope=list(lower=nullmod, upper=reg_goal10_all_new), direction="backward") 
#> Start:  AIC=12896
#> goal10 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal7                          1        49 714071 12894
#> - goal2                          1       609 714630 12895
#> <none>                                       714022 12896
#> - pf_law                         1       899 714920 12896
#> - goal6                          1      1001 715023 12897
#> - ef_money                       1      1163 715185 12897
#> - goal8                          1      1281 715303 12898
#> - MilitaryExpenditurePercentGDP  1      1695 715717 12899
#> - pf_movement                    1      1836 715858 12899
#> - internet_usage                 1      2014 716035 12900
#> - pf_identity                    1      2017 716039 12900
#> - ef_regulation                  1      2221 716243 12901
#> - GDPpercapita                   1      2816 716838 12902
#> - ef_trade                       1      8417 722439 12920
#> - goal13                         1      8959 722981 12921
#> - pf_expression                  1     11119 725140 12928
#> - ef_government                  1     12549 726571 12932
#> - goal17                         1     20482 734504 12957
#> - pf_religion                    1     21331 735353 12959
#> - population                     1     23971 737993 12967
#> - goal1                          1     24763 738785 12969
#> - unemployment.rate              1     27983 742005 12979
#> - goal5                          1     28394 742416 12980
#> - pf_security                    1     49219 763241 13042
#> - goal15                         1     59779 773801 13073
#> 
#> Step:  AIC=12894
#> goal10 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       714071 12894
#> - goal2                          1       650 714720 12894
#> - pf_law                         1      1011 715081 12895
#> - ef_money                       1      1138 715209 12895
#> - goal6                          1      1162 715233 12895
#> - goal8                          1      1246 715317 12896
#> - MilitaryExpenditurePercentGDP  1      1699 715769 12897
#> - pf_movement                    1      1804 715875 12897
#> - pf_identity                    1      1993 716064 12898
#> - ef_regulation                  1      2192 716262 12899
#> - internet_usage                 1      2240 716311 12899
#> - GDPpercapita                   1      2786 716856 12900
#> - ef_trade                       1      8405 722475 12918
#> - goal13                         1      8966 723037 12920
#> - pf_expression                  1     11075 725145 12926
#> - ef_government                  1     12531 726601 12930
#> - goal17                         1     20466 734536 12955
#> - pf_religion                    1     21458 735529 12958
#> - population                     1     23930 738001 12965
#> - unemployment.rate              1     28006 742077 12977
#> - goal5                          1     29080 743151 12981
#> - goal1                          1     36196 750267 13002
#> - pf_security                    1     49171 763242 13040
#> - goal15                         1     60474 774545 13073
vif(selmod) #goal6
#>                         goal1                         goal2 
#>                          4.56                          2.17 
#>                         goal5                         goal6 
#>                          2.99                          5.50 
#>                         goal8                        goal13 
#>                          5.28                          4.51 
#>                        goal15                        goal17 
#>                          1.39                          2.26 
#>             unemployment.rate                  GDPpercapita 
#>                          1.96                          4.53 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.52                          4.41 
#>                        pf_law                   pf_security 
#>                          7.04                          2.07 
#>                   pf_movement                   pf_religion 
#>                          3.96                          4.25 
#>                 pf_expression                   pf_identity 
#>                          5.35                          2.52 
#>                 ef_government                      ef_money 
#>                          2.06                          2.75 
#>                      ef_trade                 ef_regulation 
#>                          4.21                          2.39 
#>                    population 
#>                          1.49
reg_goal10_all_new <- lm(goal10 ~ goal1 + goal2 + goal5 + goal8 + goal13 + goal15 + 
                           goal17 + unemployment.rate + GDPpercapita + internet_usage + 
                           pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
                           ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
selmod <- step(reg_goal10_all_new, scope=list(lower=nullmod, upper=reg_goal10_all_new), direction="backward") 
#> Start:  AIC=12901
#> goal10 ~ goal1 + goal2 + goal5 + goal8 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + GDPpercapita + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                     Df Sum of Sq    RSS   AIC
#> <none>                           718406 12901
#> - goal2              1      1049 719455 12902
#> - goal8              1      1083 719489 12903
#> - ef_money           1      1343 719749 12903
#> - pf_law             1      1538 719943 12904
#> - GDPpercapita       1      2523 720928 12907
#> - internet_usage     1      2616 721022 12907
#> - ef_regulation      1      2811 721216 12908
#> - pf_movement        1      3334 721740 12910
#> - ef_trade           1      9338 727743 12928
#> - goal13             1      9386 727791 12928
#> - pf_expression      1     11617 730023 12935
#> - ef_government      1     13162 731568 12940
#> - goal17             1     17673 736079 12953
#> - pf_religion        1     24266 742671 12973
#> - population         1     27089 745494 12982
#> - unemployment.rate  1     27438 745843 12983
#> - goal5              1     37852 756258 13014
#> - pf_security        1     50340 768746 13050
#> - goal1              1     51891 770296 13054
#> - goal15             1     59277 777683 13076
vif(selmod) #pf_law
#>             goal1             goal2             goal5 
#>              3.55              1.99              2.63 
#>             goal8            goal13            goal15 
#>              5.22              4.42              1.38 
#>            goal17 unemployment.rate      GDPpercapita 
#>              2.05              1.95              4.45 
#>    internet_usage            pf_law       pf_security 
#>              4.32              6.88              2.01 
#>       pf_movement       pf_religion     pf_expression 
#>              3.72              4.04              5.13 
#>     ef_government          ef_money          ef_trade 
#>              2.04              2.74              4.06 
#>     ef_regulation        population 
#>              2.31              1.41
reg_goal10_all_new <- lm(goal10 ~ goal1 + goal2 + goal5 + goal8 + goal13 + goal15 + goal17 + 
                           unemployment.rate + GDPpercapita + internet_usage + 
                           pf_security + pf_movement + pf_religion + pf_expression + 
                           ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)

#reg13
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal13_all_new, scope=list(lower=nullmod, upper=reg_goal13_all_new), direction="backward") 
#> Start:  AIC=8827
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_movement                    1         0 114606 8825
#> - pf_law                         1         0 114606 8825
#> - pf_security                    1         2 114607 8825
#> - goal10                         1        50 114655 8826
#> <none>                                       114605 8827
#> - ef_money                       1       115 114720 8828
#> - ef_government                  1       121 114727 8828
#> - pf_identity                    1       140 114746 8828
#> - goal1                          1       181 114787 8829
#> - pf_expression                  1       241 114847 8830
#> - goal5                          1       267 114873 8830
#> - unemployment.rate              1       297 114903 8831
#> - internet_usage                 1       393 114999 8833
#> - pf_religion                    1       425 115030 8834
#> - population                     1       449 115054 8834
#> - goal7                          1       697 115303 8839
#> - goal6                          1       907 115512 8843
#> - goal2                          1      1106 115711 8847
#> - ef_regulation                  1      1193 115799 8848
#> - ef_trade                       1      1344 115950 8851
#> - goal15                         1      1581 116186 8856
#> - GDPpercapita                   1      1638 116244 8857
#> - goal11                         1      1807 116413 8860
#> - MilitaryExpenditurePercentGDP  1      2123 116729 8866
#> - goal8                          1      2847 117453 8880
#> - goal17                         1      3391 117996 8890
#> - goal12                         1     56254 170859 9714
#> 
#> Step:  AIC=8825
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_law                         1         0 114606 8823
#> - pf_security                    1         2 114608 8823
#> - goal10                         1        50 114656 8824
#> <none>                                       114606 8825
#> - ef_money                       1       115 114720 8826
#> - ef_government                  1       121 114727 8826
#> - pf_identity                    1       141 114747 8826
#> - goal1                          1       181 114787 8827
#> - pf_expression                  1       258 114863 8828
#> - goal5                          1       267 114873 8828
#> - unemployment.rate              1       297 114903 8829
#> - internet_usage                 1       396 115002 8831
#> - population                     1       449 115055 8832
#> - pf_religion                    1       477 115082 8833
#> - goal7                          1       698 115304 8837
#> - goal6                          1       908 115514 8841
#> - goal2                          1      1114 115719 8845
#> - ef_regulation                  1      1201 115807 8847
#> - ef_trade                       1      1369 115975 8850
#> - goal15                         1      1594 116199 8854
#> - GDPpercapita                   1      1640 116246 8855
#> - goal11                         1      1835 116441 8859
#> - MilitaryExpenditurePercentGDP  1      2221 116826 8866
#> - goal8                          1      2848 117454 8878
#> - goal17                         1      3452 118057 8889
#> - goal12                         1     56468 171074 9715
#> 
#> Step:  AIC=8823
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_security + 
#>     pf_religion + pf_expression + pf_identity + ef_government + 
#>     ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_security                    1         2 114608 8821
#> - goal10                         1        50 114656 8822
#> <none>                                       114606 8823
#> - ef_money                       1       116 114722 8824
#> - ef_government                  1       124 114730 8824
#> - pf_identity                    1       141 114747 8824
#> - goal1                          1       188 114793 8825
#> - pf_expression                  1       263 114869 8826
#> - goal5                          1       268 114874 8827
#> - unemployment.rate              1       318 114923 8827
#> - internet_usage                 1       398 115004 8829
#> - population                     1       449 115055 8830
#> - pf_religion                    1       485 115091 8831
#> - goal7                          1       712 115318 8835
#> - goal6                          1       908 115514 8839
#> - goal2                          1      1114 115720 8843
#> - ef_regulation                  1      1312 115918 8847
#> - ef_trade                       1      1370 115975 8848
#> - goal15                         1      1595 116200 8852
#> - GDPpercapita                   1      1664 116270 8853
#> - goal11                         1      1841 116447 8857
#> - MilitaryExpenditurePercentGDP  1      2288 116894 8865
#> - goal8                          1      2939 117544 8878
#> - goal17                         1      3487 118093 8888
#> - goal12                         1     66252 180858 9837
#> 
#> Step:  AIC=8821
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal10                         1        48 114656 8820
#> <none>                                       114608 8821
#> - ef_money                       1       115 114723 8822
#> - ef_government                  1       138 114745 8822
#> - pf_identity                    1       144 114752 8822
#> - goal1                          1       188 114795 8823
#> - pf_expression                  1       274 114882 8825
#> - goal5                          1       277 114885 8825
#> - unemployment.rate              1       316 114924 8825
#> - internet_usage                 1       404 115012 8827
#> - population                     1       450 115058 8828
#> - pf_religion                    1       485 115092 8829
#> - goal7                          1       715 115323 8833
#> - goal6                          1       920 115528 8837
#> - goal2                          1      1163 115771 8842
#> - ef_regulation                  1      1313 115920 8845
#> - ef_trade                       1      1390 115998 8846
#> - goal15                         1      1602 116210 8850
#> - GDPpercapita                   1      1668 116276 8852
#> - goal11                         1      1840 116447 8855
#> - MilitaryExpenditurePercentGDP  1      2334 116942 8864
#> - goal8                          1      2939 117547 8876
#> - goal17                         1      3509 118116 8886
#> - goal12                         1     66357 180965 9836
#> 
#> Step:  AIC=8820
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal11 + 
#>     goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> <none>                                       114656 8820
#> - ef_government                  1       109 114765 8820
#> - ef_money                       1       111 114767 8820
#> - pf_identity                    1       135 114791 8821
#> - goal1                          1       160 114816 8821
#> - goal5                          1       243 114899 8823
#> - pf_expression                  1       246 114902 8823
#> - unemployment.rate              1       284 114940 8824
#> - internet_usage                 1       385 115041 8826
#> - population                     1       408 115064 8826
#> - pf_religion                    1       577 115233 8829
#> - goal7                          1       693 115349 8832
#> - goal6                          1       935 115591 8836
#> - goal2                          1      1134 115790 8840
#> - ef_regulation                  1      1395 116051 8845
#> - ef_trade                       1      1463 116119 8847
#> - goal15                         1      1554 116210 8848
#> - GDPpercapita                   1      1691 116347 8851
#> - goal11                         1      2026 116682 8857
#> - MilitaryExpenditurePercentGDP  1      2383 117039 8864
#> - goal8                          1      3037 117693 8876
#> - goal17                         1      3477 118133 8885
#> - goal12                         1     68748 183404 9864
vif(selmod) 
#>                         goal1                         goal2 
#>                          7.60                          2.10 
#>                         goal5                         goal6 
#>                          3.31                          5.73 
#>                         goal7                         goal8 
#>                          7.05                          5.11 
#>                        goal11                        goal12 
#>                          6.31                          9.68 
#>                        goal15                        goal17 
#>                          1.57                          2.24 
#>             unemployment.rate                  GDPpercapita 
#>                          1.90                          5.14 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.40                          4.49 
#>                   pf_religion                 pf_expression 
#>                          3.73                          4.90 
#>                   pf_identity                 ef_government 
#>                          2.55                          1.92 
#>                      ef_money                      ef_trade 
#>                          2.76                          4.23 
#>                 ef_regulation                    population 
#>                          2.33                          1.62
reg_goal13_all_new <- lm(goal13 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal17 + unemployment.rate + 
                           GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
                           pf_law + pf_religion + ef_government + ef_regulation, data = data_question1)

#reg15
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal15_all_new, scope=list(lower=nullmod, upper=reg_goal15_all_new), direction="backward") 
#> Start:  AIC=10430
#> goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_religion                    1         0 235446 10428
#> - goal6                          1        13 235459 10428
#> - ef_regulation                  1        43 235489 10428
#> - pf_expression                  1        56 235502 10429
#> - pf_law                         1        58 235504 10429
#> - ef_money                       1       204 235649 10430
#> <none>                                       235446 10430
#> - goal2                          1       228 235674 10430
#> - ef_trade                       1       236 235681 10430
#> - MilitaryExpenditurePercentGDP  1       399 235845 10432
#> - goal7                          1       613 236058 10434
#> - goal8                          1       663 236109 10434
#> - goal1                          1       972 236418 10437
#> - goal17                         1      1333 236779 10441
#> - pf_movement                    1      1567 237013 10443
#> - goal5                          1      1759 237204 10445
#> - pf_identity                    1      2755 238201 10454
#> - goal13                         1      3247 238693 10458
#> - pf_security                    1      3394 238840 10460
#> - GDPpercapita                   1      3583 239028 10462
#> - ef_government                  1      4554 240000 10471
#> - goal12                         1      6226 241671 10486
#> - population                     1      9707 245152 10518
#> - goal10                         1     10282 245728 10523
#> - internet_usage                 1     14108 249554 10558
#> - unemployment.rate              1     15446 250892 10569
#> - goal11                         1     16731 252176 10581
#> 
#> Step:  AIC=10428
#> goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal6                          1        13 235459 10426
#> - ef_regulation                  1        43 235489 10426
#> - pf_law                         1        58 235504 10427
#> - pf_expression                  1        73 235519 10427
#> - ef_money                       1       205 235651 10428
#> <none>                                       235446 10428
#> - goal2                          1       229 235675 10428
#> - ef_trade                       1       236 235682 10428
#> - MilitaryExpenditurePercentGDP  1       400 235846 10430
#> - goal7                          1       613 236059 10432
#> - goal8                          1       668 236113 10432
#> - goal1                          1       986 236432 10435
#> - goal17                         1      1348 236794 10439
#> - pf_movement                    1      1750 237196 10442
#> - goal5                          1      1766 237212 10443
#> - pf_identity                    1      2881 238327 10453
#> - goal13                         1      3261 238707 10457
#> - pf_security                    1      3434 238880 10458
#> - GDPpercapita                   1      3586 239031 10460
#> - ef_government                  1      4720 240166 10470
#> - goal12                         1      6229 241675 10484
#> - goal10                         1     10644 246090 10524
#> - population                     1     11710 247156 10534
#> - internet_usage                 1     14126 249572 10556
#> - unemployment.rate              1     15452 250897 10567
#> - goal11                         1     16986 252432 10581
#> 
#> Step:  AIC=10426
#> goal15 ~ goal1 + goal2 + goal5 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_regulation                  1        43 235502 10425
#> - pf_law                         1        58 235517 10425
#> - pf_expression                  1        81 235540 10425
#> - ef_money                       1       202 235661 10426
#> <none>                                       235459 10426
#> - goal2                          1       216 235675 10426
#> - ef_trade                       1       241 235700 10426
#> - MilitaryExpenditurePercentGDP  1       397 235856 10428
#> - goal7                          1       601 236060 10430
#> - goal8                          1       656 236115 10430
#> - goal1                          1       978 236437 10433
#> - goal17                         1      1343 236802 10437
#> - pf_movement                    1      1749 237208 10441
#> - goal5                          1      1809 237268 10441
#> - pf_identity                    1      3191 238650 10454
#> - goal13                         1      3250 238709 10455
#> - pf_security                    1      3488 238948 10457
#> - GDPpercapita                   1      3573 239032 10458
#> - ef_government                  1      4763 240222 10469
#> - goal12                         1      6217 241676 10482
#> - goal10                         1     10692 246152 10523
#> - population                     1     11889 247348 10534
#> - internet_usage                 1     14327 249786 10556
#> - unemployment.rate              1     15440 250899 10566
#> - goal11                         1     16984 252443 10579
#> 
#> Step:  AIC=10425
#> goal15 ~ goal1 + goal2 + goal5 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_expression                  1        89 235592 10423
#> - pf_law                         1        97 235599 10423
#> - goal2                          1       205 235707 10424
#> - ef_trade                       1       206 235708 10424
#> <none>                                       235502 10425
#> - ef_money                       1       212 235714 10425
#> - MilitaryExpenditurePercentGDP  1       407 235909 10426
#> - goal7                          1       558 236060 10428
#> - goal8                          1       697 236199 10429
#> - goal1                          1      1013 236515 10432
#> - goal17                         1      1353 236855 10435
#> - pf_movement                    1      1712 237214 10439
#> - goal5                          1      1769 237271 10439
#> - goal13                         1      3365 238867 10454
#> - pf_identity                    1      3391 238893 10454
#> - pf_security                    1      3478 238980 10455
#> - GDPpercapita                   1      3551 239053 10456
#> - ef_government                  1      5339 240842 10472
#> - goal12                         1      6438 241940 10483
#> - goal10                         1     10656 246158 10521
#> - population                     1     12070 247572 10534
#> - internet_usage                 1     14826 250328 10558
#> - unemployment.rate              1     15398 250900 10564
#> - goal11                         1     17680 253182 10584
#> 
#> Step:  AIC=10423
#> goal15 ~ goal1 + goal2 + goal5 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_identity + ef_government + 
#>     ef_money + ef_trade + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_law                         1        60 235652 10422
#> - goal2                          1       198 235789 10423
#> <none>                                       235592 10423
#> - ef_trade                       1       213 235804 10423
#> - ef_money                       1       215 235806 10423
#> - MilitaryExpenditurePercentGDP  1       365 235956 10425
#> - goal7                          1       568 236159 10427
#> - goal8                          1       856 236447 10429
#> - goal1                          1      1106 236697 10432
#> - goal17                         1      1276 236867 10433
#> - goal5                          1      1739 237330 10438
#> - pf_movement                    1      2606 238197 10446
#> - pf_identity                    1      3378 238969 10453
#> - goal13                         1      3483 239074 10454
#> - pf_security                    1      3503 239095 10454
#> - GDPpercapita                   1      3547 239139 10455
#> - ef_government                  1      5256 240847 10470
#> - goal12                         1      6866 242457 10485
#> - goal10                         1     10740 246331 10521
#> - population                     1     11986 247577 10532
#> - internet_usage                 1     14757 250348 10557
#> - unemployment.rate              1     15846 251438 10566
#> - goal11                         1     17840 253432 10584
#> 
#> Step:  AIC=10422
#> goal15 ~ goal1 + goal2 + goal5 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_security + 
#>     pf_movement + pf_identity + ef_government + ef_money + ef_trade + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_trade                       1       191 235843 10422
#> - goal2                          1       191 235843 10422
#> - ef_money                       1       199 235850 10422
#> <none>                                       235652 10422
#> - MilitaryExpenditurePercentGDP  1       332 235983 10423
#> - goal7                          1       604 236256 10426
#> - goal8                          1       797 236449 10427
#> - goal1                          1      1047 236698 10430
#> - goal17                         1      1239 236891 10432
#> - goal5                          1      1772 237423 10437
#> - pf_movement                    1      2556 238207 10444
#> - pf_identity                    1      3456 239108 10452
#> - goal13                         1      3488 239140 10453
#> - GDPpercapita                   1      3713 239365 10455
#> - pf_security                    1      3946 239598 10457
#> - ef_government                  1      5203 240854 10469
#> - goal12                         1      7309 242961 10488
#> - goal10                         1     10707 246359 10519
#> - population                     1     11961 247613 10530
#> - internet_usage                 1     14765 250417 10555
#> - unemployment.rate              1     16829 252481 10573
#> - goal11                         1     18214 253866 10586
#> 
#> Step:  AIC=10422
#> goal15 ~ goal1 + goal2 + goal5 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_security + 
#>     pf_movement + pf_identity + ef_government + ef_money + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_money                       1        65 235908 10420
#> - goal2                          1       194 236036 10422
#> <none>                                       235843 10422
#> - MilitaryExpenditurePercentGDP  1       360 236203 10423
#> - goal7                          1       543 236386 10425
#> - goal8                          1      1006 236849 10429
#> - goal1                          1      1073 236916 10430
#> - goal17                         1      1296 237139 10432
#> - goal5                          1      1739 237582 10436
#> - pf_movement                    1      2963 238806 10448
#> - pf_identity                    1      3627 239470 10454
#> - goal13                         1      3655 239498 10454
#> - pf_security                    1      3837 239680 10456
#> - GDPpercapita                   1      3859 239702 10456
#> - ef_government                  1      5035 240878 10467
#> - goal12                         1      8012 243855 10494
#> - goal10                         1     10544 246387 10517
#> - population                     1     12253 248096 10532
#> - internet_usage                 1     14585 250428 10553
#> - unemployment.rate              1     17522 253365 10579
#> - goal11                         1     18112 253955 10584
#> 
#> Step:  AIC=10420
#> goal15 ~ goal1 + goal2 + goal5 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_security + 
#>     pf_movement + pf_identity + ef_government + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       235908 10420
#> - goal2                          1       227 236135 10421
#> - MilitaryExpenditurePercentGDP  1       355 236262 10422
#> - goal7                          1       529 236436 10423
#> - goal8                          1       990 236898 10428
#> - goal1                          1      1062 236970 10428
#> - goal17                         1      1352 237259 10431
#> - goal5                          1      1709 237616 10434
#> - pf_movement                    1      2909 238817 10446
#> - goal13                         1      3604 239511 10452
#> - pf_identity                    1      3668 239575 10453
#> - GDPpercapita                   1      3826 239733 10454
#> - pf_security                    1      3854 239761 10454
#> - ef_government                  1      5579 241486 10470
#> - goal12                         1      7979 243887 10492
#> - goal10                         1     10529 246437 10516
#> - population                     1     12400 248308 10532
#> - internet_usage                 1     14661 250569 10553
#> - unemployment.rate              1     17510 253417 10578
#> - goal11                         1     18503 254411 10586
vif(selmod) 
#>                         goal1                         goal2 
#>                          7.20                          2.04 
#>                         goal5                         goal7 
#>                          3.31                          6.36 
#>                         goal8                        goal10 
#>                          4.59                          2.43 
#>                        goal11                        goal12 
#>                          5.93                         13.53 
#>                        goal13                        goal17 
#>                          6.69                          2.33 
#>             unemployment.rate                  GDPpercapita 
#>                          1.71                          5.07 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.48                          3.88 
#>                   pf_security                   pf_movement 
#>                          2.02                          2.60 
#>                   pf_identity                 ef_government 
#>                          2.22                          1.77 
#>                    population 
#>                          1.31
reg_goal15_all_new <- lm(goal15 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal17 + unemployment.rate + 
                           GDPpercapita + internet_usage + pf_law + pf_security + pf_religion + 
                           pf_expression + pf_identity + ef_government + population, data = data_question1)

#reg16
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal16_all_new, scope=list(lower=nullmod, upper=reg_goal16_all_new), direction="backward") 
#> Start:  AIC=7696
#> goal16 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal15 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal13                         1         1 68945 7694
#> - goal5                          1         2 68946 7694
#> - goal12                         1         9 68953 7694
#> - goal8                          1        13 68957 7694
#> - pf_identity                    1        47 68991 7696
#> <none>                                       68944 7696
#> - goal1                          1       136 69080 7698
#> - MilitaryExpenditurePercentGDP  1       145 69089 7699
#> - ef_trade                       1       175 69119 7700
#> - goal2                          1       238 69182 7702
#> - ef_government                  1       246 69190 7702
#> - ef_money                       1       266 69210 7703
#> - GDPpercapita                   1       400 69344 7707
#> - ef_regulation                  1       403 69347 7707
#> - goal15                         1       448 69392 7708
#> - internet_usage                 1       449 69393 7709
#> - goal7                          1       504 69448 7710
#> - unemployment.rate              1       991 69935 7726
#> - population                     1      1105 70049 7729
#> - pf_movement                    1      1214 70158 7733
#> - pf_religion                    1      2224 71168 7765
#> - goal6                          1      2404 71348 7770
#> - pf_expression                  1      3837 72781 7815
#> - goal10                         1      4374 73318 7831
#> - pf_security                    1      5331 74274 7860
#> - pf_law                         1      5408 74352 7862
#> - goal11                         1      6723 75667 7901
#> 
#> Step:  AIC=7694
#> goal16 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal15 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal5                          1         3 68947 7692
#> - goal12                         1         9 68954 7692
#> - goal8                          1        12 68957 7692
#> - pf_identity                    1        47 68991 7694
#> <none>                                       68945 7694
#> - goal1                          1       136 69081 7696
#> - MilitaryExpenditurePercentGDP  1       146 69091 7697
#> - ef_trade                       1       175 69119 7698
#> - goal2                          1       244 69189 7700
#> - ef_government                  1       245 69190 7700
#> - ef_money                       1       267 69212 7701
#> - ef_regulation                  1       403 69348 7705
#> - GDPpercapita                   1       404 69349 7705
#> - internet_usage                 1       448 69393 7707
#> - goal15                         1       461 69406 7707
#> - goal7                          1       504 69448 7708
#> - unemployment.rate              1       992 69937 7724
#> - population                     1      1107 70052 7728
#> - pf_movement                    1      1216 70161 7731
#> - pf_religion                    1      2232 71176 7763
#> - goal6                          1      2417 71362 7769
#> - pf_expression                  1      3840 72785 7813
#> - goal10                         1      4375 73320 7829
#> - pf_security                    1      5333 74278 7858
#> - pf_law                         1      5412 74357 7860
#> - goal11                         1      6885 75830 7904
#> 
#> Step:  AIC=7692
#> goal16 ~ goal1 + goal2 + goal6 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal15 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal12                         1         8 68955 7690
#> - goal8                          1        11 68959 7691
#> - pf_identity                    1        45 68992 7692
#> <none>                                       68947 7692
#> - goal1                          1       137 69084 7695
#> - MilitaryExpenditurePercentGDP  1       149 69096 7695
#> - ef_trade                       1       172 69120 7696
#> - goal2                          1       242 69189 7698
#> - ef_government                  1       246 69194 7698
#> - ef_money                       1       265 69212 7699
#> - ef_regulation                  1       408 69356 7703
#> - GDPpercapita                   1       412 69359 7703
#> - goal15                         1       459 69406 7705
#> - internet_usage                 1       472 69419 7705
#> - goal7                          1       524 69472 7707
#> - unemployment.rate              1       994 69941 7722
#> - population                     1      1116 70063 7726
#> - pf_movement                    1      1222 70169 7729
#> - pf_religion                    1      2238 71185 7761
#> - goal6                          1      2428 71375 7767
#> - pf_expression                  1      3846 72793 7811
#> - goal10                         1      4608 73555 7834
#> - pf_security                    1      5412 74359 7858
#> - pf_law                         1      5521 74468 7862
#> - goal11                         1      7131 76079 7909
#> 
#> Step:  AIC=7690
#> goal16 ~ goal1 + goal2 + goal6 + goal7 + goal8 + goal10 + goal11 + 
#>     goal15 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal8                          1        12 68967 7689
#> - pf_identity                    1        54 69009 7690
#> <none>                                       68955 7690
#> - goal1                          1       131 69086 7693
#> - MilitaryExpenditurePercentGDP  1       144 69099 7693
#> - ef_trade                       1       167 69122 7694
#> - goal2                          1       234 69189 7696
#> - ef_government                  1       258 69213 7697
#> - ef_money                       1       272 69227 7697
#> - ef_regulation                  1       405 69360 7701
#> - internet_usage                 1       481 69436 7704
#> - goal15                         1       487 69442 7704
#> - goal7                          1       536 69491 7706
#> - GDPpercapita                   1       645 69600 7709
#> - unemployment.rate              1       987 69942 7720
#> - population                     1      1135 70091 7725
#> - pf_movement                    1      1243 70198 7728
#> - pf_religion                    1      2261 71216 7760
#> - goal6                          1      2444 71399 7766
#> - pf_expression                  1      3948 72904 7812
#> - goal10                         1      4708 73663 7835
#> - pf_security                    1      5405 74360 7856
#> - pf_law                         1      6691 75646 7895
#> - goal11                         1      7155 76110 7908
#> 
#> Step:  AIC=7689
#> goal16 ~ goal1 + goal2 + goal6 + goal7 + goal10 + goal11 + goal15 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - pf_identity                    1        57 69024 7689
#> <none>                                       68967 7689
#> - goal1                          1       120 69087 7691
#> - MilitaryExpenditurePercentGDP  1       147 69114 7692
#> - ef_trade                       1       157 69124 7692
#> - ef_money                       1       268 69235 7695
#> - goal2                          1       271 69238 7696
#> - ef_government                  1       275 69242 7696
#> - ef_regulation                  1       397 69365 7700
#> - internet_usage                 1       470 69437 7702
#> - goal15                         1       494 69461 7703
#> - goal7                          1       567 69534 7705
#> - GDPpercapita                   1       645 69612 7708
#> - population                     1      1126 70093 7723
#> - unemployment.rate              1      1182 70149 7725
#> - pf_movement                    1      1237 70204 7726
#> - pf_religion                    1      2249 71216 7758
#> - goal6                          1      2432 71399 7764
#> - pf_expression                  1      4146 73113 7817
#> - goal10                         1      4750 73717 7835
#> - pf_security                    1      5402 74369 7855
#> - pf_law                         1      7039 76006 7903
#> - goal11                         1      7362 76329 7913
#> 
#> Step:  AIC=7689
#> goal16 ~ goal1 + goal2 + goal6 + goal7 + goal10 + goal11 + goal15 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> <none>                                       69024 7689
#> - goal1                          1       121 69145 7691
#> - ef_trade                       1       134 69158 7691
#> - MilitaryExpenditurePercentGDP  1       134 69158 7691
#> - ef_money                       1       256 69280 7695
#> - goal2                          1       259 69282 7695
#> - ef_government                  1       288 69312 7696
#> - ef_regulation                  1       364 69388 7698
#> - internet_usage                 1       471 69495 7702
#> - goal15                         1       559 69583 7705
#> - goal7                          1       568 69592 7705
#> - GDPpercapita                   1       654 69678 7708
#> - population                     1      1071 70095 7721
#> - unemployment.rate              1      1155 70178 7724
#> - pf_movement                    1      1195 70219 7725
#> - pf_religion                    1      2199 71222 7756
#> - goal6                          1      2830 71854 7776
#> - pf_expression                  1      4089 73113 7815
#> - goal10                         1      4720 73744 7834
#> - pf_security                    1      5345 74369 7853
#> - pf_law                         1      7079 76103 7904
#> - goal11                         1      7892 76916 7928
vif(selmod) 
#>                         goal1                         goal2 
#>                          5.99                          1.99 
#>                         goal6                         goal7 
#>                          5.27                          6.90 
#>                        goal10                        goal11 
#>                          2.46                          5.96 
#>                        goal15             unemployment.rate 
#>                          1.56                          1.50 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          3.62                          1.35 
#>                internet_usage                        pf_law 
#>                          4.28                          6.50 
#>                   pf_security                   pf_movement 
#>                          2.16                          3.89 
#>                   pf_religion                 pf_expression 
#>                          4.15                          5.05 
#>                 ef_government                      ef_money 
#>                          1.93                          2.73 
#>                      ef_trade                 ef_regulation 
#>                          4.08                          2.42 
#>                    population 
#>                          1.44
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + 
                           unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                           internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
                           pf_expression + pf_identity + ef_government + ef_money + 
                           ef_regulation + population, data = data_question1)
selmod <- step(reg_goal16_all_new, scope=list(lower=nullmod, upper=reg_goal16_all_new), direction="backward") 
#> Start:  AIC=7983
#> goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal13                         1        38 78886 7982
#> - goal8                          1        43 78892 7982
#> <none>                                       78848 7983
#> - internet_usage                 1       189 79038 7986
#> - MilitaryExpenditurePercentGDP  1       253 79101 7988
#> - ef_money                       1       377 79225 7991
#> - goal5                          1       789 79637 8003
#> - pf_movement                    1       795 79643 8003
#> - goal2                          1       836 79684 8004
#> - ef_regulation                  1       847 79695 8005
#> - GDPpercapita                   1      1002 79850 8009
#> - unemployment.rate              1      1107 79956 8012
#> - pf_identity                    1      1183 80031 8014
#> - ef_government                  1      1222 80070 8015
#> - goal1                          1      1707 80555 8029
#> - pf_religion                    1      2895 81743 8061
#> - goal10                         1      3671 82519 8082
#> - population                     1      3971 82820 8090
#> - pf_security                    1      4158 83006 8095
#> - pf_expression                  1      4512 83361 8105
#> - pf_law                         1      8331 87179 8204
#> 
#> Step:  AIC=7982
#> goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal8                          1        34 78920 7981
#> <none>                                       78886 7982
#> - internet_usage                 1       172 79058 7985
#> - MilitaryExpenditurePercentGDP  1       236 79122 7987
#> - ef_money                       1       380 79267 7991
#> - goal5                          1       754 79640 8001
#> - pf_movement                    1       777 79664 8002
#> - ef_regulation                  1       830 79717 8003
#> - goal2                          1       944 79831 8006
#> - GDPpercapita                   1      1052 79939 8009
#> - unemployment.rate              1      1117 80003 8011
#> - pf_identity                    1      1153 80039 8012
#> - ef_government                  1      1206 80093 8014
#> - goal1                          1      1670 80556 8027
#> - pf_religion                    1      2861 81747 8059
#> - goal10                         1      3634 82520 8080
#> - population                     1      3974 82860 8089
#> - pf_security                    1      4180 83067 8095
#> - pf_expression                  1      4478 83365 8103
#> - pf_law                         1      8425 87311 8206
#> 
#> Step:  AIC=7981
#> goal16 ~ goal1 + goal2 + goal5 + goal10 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> <none>                                       78920 7981
#> - internet_usage                 1       153 79073 7983
#> - MilitaryExpenditurePercentGDP  1       245 79166 7986
#> - ef_money                       1       392 79313 7990
#> - pf_movement                    1       762 79682 8000
#> - goal5                          1       784 79704 8001
#> - ef_regulation                  1       827 79747 8002
#> - goal2                          1      1044 79965 8008
#> - GDPpercapita                   1      1048 79968 8008
#> - pf_identity                    1      1181 80101 8012
#> - ef_government                  1      1265 80185 8014
#> - unemployment.rate              1      1274 80195 8015
#> - goal1                          1      1915 80835 8032
#> - pf_religion                    1      2839 81759 8058
#> - goal10                         1      3682 82602 8080
#> - population                     1      3942 82862 8087
#> - pf_security                    1      4191 83111 8094
#> - pf_expression                  1      4790 83710 8110
#> - pf_law                         1      9028 87948 8220
vif(selmod)#pf_law
#>                         goal1                         goal2 
#>                          3.13                          1.86 
#>                         goal5                        goal10 
#>                          2.82                          2.22 
#>             unemployment.rate                  GDPpercapita 
#>                          1.37                          3.48 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.37                          3.91 
#>                        pf_law                   pf_security 
#>                          6.18                          2.16 
#>                   pf_movement                   pf_religion 
#>                          3.79                          4.27 
#>                 pf_expression                   pf_identity 
#>                          5.04                          2.22 
#>                 ef_government                      ef_money 
#>                          1.93                          2.03 
#>                 ef_regulation                    population 
#>                          2.28                          1.38
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
  GDPpercapita + MilitaryExpenditurePercentGDP + pf_security + 
  pf_movement + pf_religion + pf_expression + pf_identity + 
  ef_government + ef_money + ef_regulation + population, data = data_question1)
selmod <- step(reg_goal16_all_new, scope=list(lower=nullmod, upper=reg_goal16_all_new), direction="backward") 
#> Start:  AIC=8211
#> goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + pf_security + 
#>     pf_movement + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_regulation + population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> <none>                                       87515 8211
#> - goal13                         1       196 87711 8214
#> - ef_money                       1       253 87768 8215
#> - goal8                          1       409 87924 8219
#> - goal5                          1       488 88003 8221
#> - MilitaryExpenditurePercentGDP  1       622 88138 8225
#> - goal2                          1       719 88234 8227
#> - pf_movement                    1       747 88263 8228
#> - pf_identity                    1      1546 89062 8248
#> - pf_religion                    1      1789 89304 8254
#> - goal1                          1      2010 89525 8260
#> - GDPpercapita                   1      2733 90248 8277
#> - ef_government                  1      2919 90434 8282
#> - ef_regulation                  1      2919 90434 8282
#> - unemployment.rate              1      3078 90593 8286
#> - goal10                         1      4225 91740 8314
#> - population                     1      4495 92010 8320
#> - pf_security                    1      8074 95589 8405
#> - pf_expression                  1      8297 95813 8411
vif(selmod) #pf_law
#>                         goal1                         goal2 
#>                          3.21                          2.00 
#>                         goal5                         goal8 
#>                          2.70                          4.80 
#>                        goal10                        goal13 
#>                          2.21                          4.28 
#>             unemployment.rate                  GDPpercapita 
#>                          1.78                          3.52 
#> MilitaryExpenditurePercentGDP                   pf_security 
#>                          1.37                          2.02 
#>                   pf_movement                   pf_religion 
#>                          3.78                          4.22 
#>                 pf_expression                   pf_identity 
#>                          4.93                          2.24 
#>                 ef_government                      ef_money 
#>                          1.87                          2.01 
#>                 ef_regulation                    population 
#>                          2.05                          1.40
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
                           GDPpercapita + MilitaryExpenditurePercentGDP + pf_security + 
                           pf_movement + pf_religion + pf_expression + pf_identity + 
                           ef_government + ef_money + ef_regulation + population, data = data_question1)

#reg17
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal17_all_new, scope=list(lower=nullmod, upper=reg_goal17_all_new), direction="backward") 
#> Start:  AIC=8812
#> goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal15 + goal16 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - ef_regulation                  1         0 113743 8811
#> - goal8                          1         1 113744 8811
#> - goal6                          1        58 113801 8812
#> <none>                                       113743 8812
#> - ef_trade                       1       215 113958 8815
#> - pf_religion                    1       285 114028 8816
#> - pf_identity                    1       346 114089 8817
#> - ef_money                       1       477 114220 8820
#> - unemployment.rate              1       636 114379 8823
#> - internet_usage                 1       697 114440 8824
#> - goal2                          1      1051 114794 8831
#> - goal15                         1      1190 114933 8834
#> - pf_expression                  1      1837 115580 8846
#> - goal7                          1      2128 115871 8852
#> - pf_security                    1      2146 115889 8852
#> - goal13                         1      3708 117451 8882
#> - goal11                         1      3709 117452 8882
#> - pf_movement                    1      3787 117530 8883
#> - GDPpercapita                   1      3807 117550 8884
#> - pf_law                         1      3935 117678 8886
#> - goal12                         1      5674 119417 8919
#> - goal10                         1      6266 120009 8930
#> - goal5                          1      6364 120107 8932
#> - goal1                          1      6400 120143 8932
#> - population                     1      6581 120324 8936
#> - ef_government                  1      6824 120567 8940
#> - MilitaryExpenditurePercentGDP  1      8879 122622 8978
#> - goal16                         1     10158 123901 9001
#> 
#> Step:  AIC=8810
#> goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal15 + goal16 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal8                          1         1 113744 8809
#> - goal6                          1        58 113802 8810
#> <none>                                       113743 8811
#> - ef_trade                       1       233 113977 8813
#> - pf_religion                    1       285 114029 8814
#> - pf_identity                    1       350 114094 8815
#> - ef_money                       1       477 114220 8818
#> - unemployment.rate              1       636 114380 8821
#> - internet_usage                 1       733 114476 8823
#> - goal2                          1      1052 114796 8829
#> - goal15                         1      1190 114933 8832
#> - pf_expression                  1      1851 115595 8844
#> - pf_security                    1      2146 115890 8850
#> - goal7                          1      2298 116041 8853
#> - goal13                         1      3742 117485 8881
#> - goal11                         1      3769 117512 8881
#> - GDPpercapita                   1      3815 117558 8882
#> - pf_movement                    1      3819 117563 8882
#> - pf_law                         1      4212 117955 8889
#> - goal12                         1      5751 119495 8918
#> - goal10                         1      6298 120042 8928
#> - goal1                          1      6433 120176 8931
#> - goal5                          1      6610 120353 8934
#> - population                     1      6626 120369 8935
#> - ef_government                  1      7303 121047 8947
#> - MilitaryExpenditurePercentGDP  1      8894 122637 8976
#> - goal16                         1     10207 123951 9000
#> 
#> Step:  AIC=8809
#> goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal11 + 
#>     goal12 + goal13 + goal15 + goal16 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal6                          1        60 113804 8808
#> <none>                                       113744 8809
#> - ef_trade                       1       235 113980 8811
#> - pf_religion                    1       284 114029 8812
#> - pf_identity                    1       349 114094 8813
#> - ef_money                       1       476 114220 8816
#> - internet_usage                 1       765 114510 8821
#> - unemployment.rate              1       823 114568 8823
#> - goal2                          1      1095 114839 8828
#> - goal15                         1      1190 114934 8830
#> - pf_expression                  1      1919 115663 8844
#> - pf_security                    1      2146 115890 8848
#> - goal7                          1      2313 116058 8851
#> - goal11                         1      3802 117546 8880
#> - GDPpercapita                   1      3815 117560 8880
#> - pf_movement                    1      3823 117567 8880
#> - goal13                         1      3846 117590 8881
#> - pf_law                         1      4276 118020 8889
#> - goal12                         1      5785 119529 8917
#> - goal10                         1      6311 120055 8927
#> - goal5                          1      6637 120382 8933
#> - population                     1      6731 120475 8934
#> - goal1                          1      6743 120487 8935
#> - ef_government                  1      7417 121162 8947
#> - MilitaryExpenditurePercentGDP  1      8899 122644 8974
#> - goal16                         1     10209 123953 8998
#> 
#> Step:  AIC=8808
#> goal17 ~ goal1 + goal2 + goal5 + goal7 + goal10 + goal11 + goal12 + 
#>     goal13 + goal15 + goal16 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> <none>                                       113804 8808
#> - ef_trade                       1       244 114049 8810
#> - pf_religion                    1       309 114113 8812
#> - pf_identity                    1       441 114245 8814
#> - ef_money                       1       467 114271 8815
#> - internet_usage                 1       734 114539 8820
#> - unemployment.rate              1       829 114633 8822
#> - goal15                         1      1182 114987 8829
#> - goal2                          1      1259 115063 8830
#> - pf_expression                  1      1907 115712 8843
#> - pf_security                    1      2091 115896 8846
#> - goal7                          1      2267 116072 8850
#> - GDPpercapita                   1      3771 117575 8878
#> - goal11                         1      3791 117595 8879
#> - goal13                         1      3793 117597 8879
#> - pf_movement                    1      3820 117624 8879
#> - pf_law                         1      4228 118032 8887
#> - goal12                         1      5752 119557 8915
#> - goal10                         1      6301 120106 8926
#> - goal5                          1      6577 120382 8931
#> - population                     1      6680 120485 8933
#> - goal1                          1      6753 120557 8934
#> - ef_government                  1      7365 121169 8945
#> - MilitaryExpenditurePercentGDP  1      8952 122756 8974
#> - goal16                         1     10269 124074 8998
vif(selmod) 
#>                         goal1                         goal2 
#>                          7.08                          2.00 
#>                         goal5                         goal7 
#>                          3.19                          6.34 
#>                        goal10                        goal11 
#>                          2.75                          7.18 
#>                        goal12                        goal13 
#>                         16.38                          6.59 
#>                        goal15                        goal16 
#>                          1.67                          6.93 
#>             unemployment.rate                  GDPpercapita 
#>                          1.55                          5.09 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.44                          4.20 
#>                        pf_law                   pf_security 
#>                          7.96                          2.35 
#>                   pf_movement                   pf_religion 
#>                          3.99                          4.49 
#>                 pf_expression                   pf_identity 
#>                          5.47                          2.41 
#>                 ef_government                      ef_money 
#>                          1.93                          2.76 
#>                      ef_trade                    population 
#>                          4.04                          1.57
reg_goal17_all_new <- lm(goal17 ~ goal1 + goal2 + goal5 + goal10 + goal13 + goal15 + unemployment.rate + 
                           GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_security + pf_movement + pf_religion + pf_expression +
                           pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
                           population, data=data_question1)

::: stargazer regressions ::: {.cell layout-align=“center”}

Impact of variables over SDG goals 1,2
Dependent variable:
goal1 goal2
(1) (2)
goal5 -0.218*** 0.109***
(0.032) (0.014)
goal6 0.995*** 0.294***
(0.039) (0.018)
goal8 0.903*** 0.375***
(0.072) (0.034)
goal10 0.182*** 0.017**
(0.016) (0.007)
goal13 -0.208*** 0.136***
(0.033) (0.014)
goal15 -0.254*** -0.044***
(0.028) (0.012)
goal17 0.356*** -0.033*
(0.038) (0.017)
unemployment.rate 112.000*** 14.500***
(7.440) (3.360)
GDPpercapita -0.0003***
(0.00003)
MilitaryExpenditurePercentGDP 1.440*** -0.520***
(0.335) (0.149)
internet_usage 19.800*** 1.740**
(2.080) (0.871)
pf_law -0.462**
(0.205)
pf_security 1.070***
(0.118)
pf_movement 1.350*** -0.849***
(0.360) (0.142)
pf_religion -3.950***
(0.303)
ef_government 4.910***
(0.360)
pf_identity -0.274***
(0.093)
ef_money -0.777** 0.667***
(0.352) (0.157)
ef_trade 2.740*** 0.106
(0.460) (0.208)
ef_regulation -1.250*** -0.742***
(0.441) (0.200)
population 0.000***
(0.000)
Constant -71.400*** 0.280
(7.630) (3.150)
Observations 2,226 2,226
R2 0.788 0.537
Adjusted R2 0.786 0.534
Residual Std. Error 15.000 (df = 2208) 6.660 (df = 2207)
F Statistic 483.000*** (df = 17; 2208) 142.000*** (df = 18; 2207)
Note: p<0.1; p<0.05; p<0.01

:::

Code
##### geom point #####

#print values with correlation > 0.8 and make plots

# Filtering values where the absolute value is greater than 0.8
highcorrelations <- melted_corr_matrix_GVar %>% filter(value > 0.8)

ggplot(data_question1, aes(internet_usage, overallscore)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(GDPpercapita, goal9)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(internet_usage,goal9)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(ef_legal,goal9)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(pf_law, goal16)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(ef_legal, goal16)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

Let’s explore how the different SDG are correlated together by creating a heatmap of the correlation between our variables. We added a script to check whether the correlations are significantly different from 0. First, let’s select the SDGs scores.

Code
sdg_scores <- Q4[, c('goal1', 'goal2', 'goal3', 'goal4', 'goal5', 'goal6',
                     'goal7', 'goal8', 'goal9', 'goal10', 'goal11', 'goal12',
                     'goal13', 'goal15', 'goal16', 'goal17')]

We then, initialize the matrices and calculate the correlation, and p-values of each combination of SDGs scores

Code
cor_matrix <- matrix(nrow = ncol(sdg_scores), ncol = ncol(sdg_scores))
p_matrix <- matrix(nrow = ncol(sdg_scores), ncol = ncol(sdg_scores))
rownames(cor_matrix) <- colnames(sdg_scores)
rownames(p_matrix) <- colnames(sdg_scores)
colnames(cor_matrix) <- colnames(sdg_scores)
colnames(p_matrix) <- colnames(sdg_scores)

# Calculate correlation and p-values
for (i in 1:ncol(sdg_scores)) {
  for (j in 1:ncol(sdg_scores)) {
    test_result <- cor.test(sdg_scores[, i], sdg_scores[, j])
    cor_matrix[i, j] <- test_result$estimate
    p_matrix[i, j] <- test_result$p.value}}

We then reshape our data to be able to use the ggplot2 package to create our heatmap.

Code
melted_cor_matrix <-
  melt(cor_matrix)
melted_p_matrix <-
  melt(matrix(as.vector(p_matrix), nrow = ncol(sdg_scores)))

plot_data <- # Combine the datasets
  cbind(melted_cor_matrix, p_value = melted_p_matrix$value)

ggplot(plot_data, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = sprintf("%.2f", value), color = p_value < 0.05),
            vjust = 1) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1,1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  scale_color_manual(values = c("black", "yellow")) + # black when significant, yellow if not
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(angle = 45, hjust = 1),
        legend.position = "none") +
  labs(x = 'SDG Goals', y = 'SDG Goals',
       title = 'Correlation Matrix with Significance Indicator')

Note that as said previously, we assessed the correlations to ascertain if they substantially deviated from zero, setting the significance level at an alpha of 5%. To aid in visualization, we marked any correlations that did not meet this level of significance with a yellow highlight in our graphical representation. The absence of yellow markings on our plot suggests that all Sustainable Development Goal (SDG) scores demonstrate a statistically significant correlation.

We can have a look at the shape of the corelation between the SDGs with the plot function.

Code
plot(sdg_scores)

5.3 Different methods considered

5.4 Competing approaches

5.5 Justifications

6 Conclusion

  • Take home message
  • Limitations
  • Future work?